2017-02-18 64 views
1

我有一個包含列表的字典。例如,按每個子列表中的某些值對子列表分組列表

{1: [[sender11, receiver11, text11, address11]], 
2: [[sender21, receiver21, text21, address21], [sender22, receiver22, text22, address22]], 
3: [[sender31, receiver31, text31, address31], [sender32, receiver32, text32, address32], [sender33, receiver33, text33, address33]] 
4: [[sender41, receiver41, text41, address41], [sender42, receiver42, text42, address42], [sender43, receiver43, text43, address43], [sender44, receiver44, text44, address44]]} 

我想要做的是,對於那些包含有2個或多個元素(即dict[2],在這個例子中dict[3]dict[4])的列表字典元素,我做的sender, receiver, text的每一個的比較列表值。對於每組列表值相同的sender, receiver, text,我會做一些事情。

因此,例如,在dict[3],如果sender31, receiver31, text31sender32, receiver32, text32sender33, receiver33, text33相同的值,然後我會做所有的3個列表值的東西。

說,在dict[4],如果sender41, receiver41, text41是相同的值sender42, receiver42, text42,而sender43, receiver43, text43來自sender41, receiver41, text41相同的值sender44, receiver44, text44,但不同的,然後我會在這2組獨立工作。

我寫了一個Python腳本,幾乎蠻力比較的sender21, receiver21, text21sender22, receiver22, text22的值,即

if sender21 == sender22 and receiver21 == receiver22 and text21 == text22: 
    # Do something 

這是不是有效,因爲它僅適用於2個列表值,但我不知道我應該如何實現這使得它適用於任何號碼錶的值大於1

回答

1

我覺得defaultdict是去這裏明顯的方式:

from collections import defaultdict 

def collate(seq): 
    groups = defaultdict(list) 
    for subseq in seq: 
     groups[tuple(subseq[:3])].append(subseq[3]) 
    return groups 

根據您的實際數據,您可能會用上述功能替換上述功能中的tuple(subseq[:3])(subseq[1], subseq[4], subseq[5]),或附加subseq[3]subseq本身......這將取決於你在做什麼與數據。

但是,鍵必須是元組而不是列表,因爲鍵必須是不可變的。

例子:

>>> data = [ 
...  ['S1', 'R1', 'T1', 'A3'], 
...  ['S2', 'R2', 'T2', 'A4'], 
...  ['S1', 'R1', 'T1', 'A5'], 
...  ['S2', 'R2', 'T2', 'A6'] 
... ] 

>>> collate(data) 
defaultdict(<type 'list'>, { 
    ('S2', 'R2', 'T2'): ['A4', 'A6'], 
    ('S1', 'R1', 'T1'): ['A3', 'A5'] 
}) 

你可以用這個工作就像你的任何其他詞典,例如

>>> for (sender, receiver, text), addresses in collate(data).items(): 
...  print sender, receiver, text 
...  print '|'.join(addresses) 
...  print 
... 
S2 R2 T2 
A4|A6 

S1 R1 T1 
A3|A5 
  
+0

謝謝!這很好。然而,如果現在,我想'發送者,接收者,文本'和'(接收者,發送者,文本)'在同一個組中,而不是完全匹配'(發送者,接收者,文本)',即發件人/收件人的順序無關緊要?這可能嗎?我需要散列它嗎? – Rayne

+1

集合類型是a)不可變的,b)不關心順序是'frozenset',所以像'groups [frozenset(subseq [:2]),subseq [2]]。append(subseq [3] )'聽起來很正確 - 必要時調整。 –

+0

順便說一句,這是你應該知道的或者能夠從文檔中快速找到以成爲有效的程序員的東西。一遍又一遍讀取https://docs.python.org/2/library/stdtypes.html,直到您知道爲止*標準類型將長期大量償還您的努力。 –