2017-05-09 25 views
1

我有一個元組列表。每個元組都是一個鍵值對,其中鍵是一個數字,值是一串字符。對於每個鍵我需要返回列表形式的前兩個字符和他們的計數。從元組組合中得到最高計數

例如,給定列表

[(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 

鍵是1和2,值是

"aabbc", "babdea", ..., "acdaad" 

元組可以轉化爲形式的元組

(1, {"a":2, "b":2, "c":1}),(1,{"a":2, "b":2, "d":1,"e":1})...(2,{"a":2, "c":1, "d":2}) 
對於密鑰1,

,組合元組將爲

(1,{"a":4, "b":4, "c":1, "d":1,"e":1}) 

使頂部兩個人物與他們的罪名是

[("a",4),("b",4)] 

過程將重複每個鍵

我能得到我想要的輸出,但我正在尋找一個更好的解決方案

from collections import Counter 
l=[(x[0],list(x[1])) for x in [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]] 
l2=[(y[0],Counter(y[1])) for y in l] 

l3=[(x[0][1],x[1][1]) for x in it.combinations(l2,2) if x[0][0]==x[1][0] ] 

l4=[] 
for t,y in l3: 
    d={} 
    l5=list(set(t.keys()).union(y.keys())) 
    for i in l5: 
     d[i]=t[i]+y[i] 
    d_sort=sorted(d.items(), key=lambda x: x[1], reverse=True)[:2] 

    l4.append(d_sort) 


print l4 
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]] 
+0

是您的列表中關鍵字排序? – dawg

回答

2

您也可以連接具有相同的密鑰字符串德,然後計算字符和提取兩種最常用的字符:

import collections 

data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 

groups = collections.defaultdict(str) 
for i, s in data: 
    groups[i] += s 

print([collections.Counter(string).most_common(2) 
     for string in groups.values()]) 

您將獲得:

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]] 
+0

這就是我會的方式寫這個... – dawg

0

我會使用一個defaultdict持有Counter S的被同時iteratin更新摹通過你的元組的列表:

>>> from collections import Counter, defaultdict 
>>> data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 
>>> 
>>> result = defaultdict(Counter) 
>>> for num, letters in data: 
...  result[num].update(letters) 
... 
>>> result 
defaultdict(<class 'collections.Counter'>, {1: Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'd': 1}), 2: Counter({'a': 5, 'c': 3, 'd': 2, 'b': 1})}) 

爲了獲得最常見的兩個字母的Counter對象有一個有用的方法most_common

>>> {k:v.most_common(2) for k,v in result.items()} 
{1: [('a', 4), ('b', 4)], 2: [('a', 5), ('c', 3)]} 
+0

然後,您可以使用'Counter.most_common(2)'獲取每個計數器的最常見字母。 –

+0

@LaurentLAPORTE我忽略了,是的。這有點複雜,因爲OP希望每個Coutner的所有最常見的元素。工作在... – timgeb

+1

他想要兩個最常見的:'[c.most_common(2)for result.values()]' –

0

不是相當好,但更短:

from itertools import groupby 
from collections import Counter 


lst = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 

[Counter(''.join(list(zip(*y[1]))[1])).most_common(2) for y in groupby(lst, key=lambda x: x[0])] 

# [[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]] 

我希望這有助於。

0

如果列表不排序,我會做:

from collections import Counter 
di={} 
for i, s in data: 
    di.setdefault(i, Counter()) 
    di[i]+=Counter(s) 

print [c.most_common(2) for _,c in sorted(di.items())] 

如果已經排序,你可以使用groupby的d reduce

from itertools import groupby 
li=[] 
for k, g in groupby(data, key=lambda t: t[0]): 
    li.append(reduce(lambda x,y: x+y, (Counter(t[1]) for t in g)).most_common(2)) 

print li  

兩種情況下,打印:

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]