從元組組合中得到最高計數

我有一個元組列表。每個元組都是一個鍵值對，其中鍵是一個數字，值是一串字符。對於每個鍵我需要返回列表形式的前兩個字符和他們的計數。從元組組合中得到最高計數

例如，給定列表

[(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]

鍵是1和2，值是

"aabbc", "babdea", ..., "acdaad"

元組可以轉化爲形式的元組

(1, {"a":2, "b":2, "c":1}),(1,{"a":2, "b":2, "d":1,"e":1})...(2,{"a":2, "c":1, "d":2})

對於密鑰1，

，組合元組將爲

(1,{"a":4, "b":4, "c":1, "d":1,"e":1})

使頂部兩個人物與他們的罪名是

[("a",4),("b",4)]

過程將重複每個鍵

我能得到我想要的輸出，但我正在尋找一個更好的解決方案

from collections import Counter 
l=[(x[0],list(x[1])) for x in [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")]] 
l2=[(y[0],Counter(y[1])) for y in l] 

l3=[(x[0][1],x[1][1]) for x in it.combinations(l2,2) if x[0][0]==x[1][0] ] 

l4=[] 
for t,y in l3: 
    d={} 
    l5=list(set(t.keys()).union(y.keys())) 
    for i in l5: 
     d[i]=t[i]+y[i] 
    d_sort=sorted(d.items(), key=lambda x: x[1], reverse=True)[:2] 

    l4.append(d_sort) 


print l4 
[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

來源

2017-05-09 mikeL

是您的列表中關鍵字排序？ – dawg

您也可以連接具有相同的密鑰字符串德，然後計算字符和提取兩種最常用的字符：

import collections 

data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 

groups = collections.defaultdict(str) 
for i, s in data: 
    groups[i] += s 

print([collections.Counter(string).most_common(2) 
     for string in groups.values()])

您將獲得：

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

來源

2017-05-09 18:19:25

這就是我會的方式寫這個... – dawg

我會使用一個defaultdict持有Counter S的被同時iteratin更新摹通過你的元組的列表：

>>> from collections import Counter, defaultdict 
>>> data = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 
>>> 
>>> result = defaultdict(Counter) 
>>> for num, letters in data: 
...  result[num].update(letters) 
... 
>>> result 
defaultdict(<class 'collections.Counter'>, {1: Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'd': 1}), 2: Counter({'a': 5, 'c': 3, 'd': 2, 'b': 1})})

爲了獲得最常見的兩個字母的Counter對象有一個有用的方法most_common。

>>> {k:v.most_common(2) for k,v in result.items()} 
{1: [('a', 4), ('b', 4)], 2: [('a', 5), ('c', 3)]}

來源

2017-05-09 18:04:46 timgeb

然後，您可以使用'Counter.most_common（2）'獲取每個計數器的最常見字母。 –

@LaurentLAPORTE我忽略了，是的。這有點複雜，因爲OP希望每個Coutner的所有最常見的元素。工作在... – timgeb

他想要兩個最常見的：'[c.most_common（2）for result.values（）]' –

不是相當好，但更短：

from itertools import groupby 
from collections import Counter 


lst = [(1, "aabbc"), (1, "babdea"), (2, "aabacc"), (2, "acdad")] 

[Counter(''.join(list(zip(*y[1]))[1])).most_common(2) for y in groupby(lst, key=lambda x: x[0])] 

# [[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

我希望這有助於。

來源

2017-05-09 18:13:15 Abdou

如果列表不排序，我會做：

from collections import Counter 
di={} 
for i, s in data: 
    di.setdefault(i, Counter()) 
    di[i]+=Counter(s) 

print [c.most_common(2) for _,c in sorted(di.items())]

如果已經排序，你可以使用groupby的d reduce：

from itertools import groupby 
li=[] 
for k, g in groupby(data, key=lambda t: t[0]): 
    li.append(reduce(lambda x,y: x+y, (Counter(t[1]) for t in g)).most_common(2)) 

print li

兩種情況下，打印：

[[('a', 4), ('b', 4)], [('a', 5), ('c', 3)]]

來源

2017-05-09 20:31:25 dawg

從元組組合中得到最高計數

回答

相關問題