合併元組的一部分，在Python

我有幾百元組在下面的格式（ID1，ID2，ID3，[xydata]）例如：合併元組的一部分，在Python

('a', 'b', 'c', [(1, 2),(2, 3),(3, 4)]) 
('a', 'b', 'c', [(1, 1),(2, 4),(3, 6)]) 
('a', 'b', 'd', [(1, 3),(2, 6),(3, 7)]) 
('a', 'b', 'd', [(1, 7),(2, 8),(3, 9)])

現在我要合併的元組讓那些以相同的三個值開始以下列方式組合。我保證相同的X值在所有xydata：

('a', 'b', 'c', [(1, mean(2, 1)),(2, mean(3, 4)),(3, mean(4, 6))]) 
('a', 'b', 'd', [(1, mean(3, 7)),(2, mean(6, 8)),(3, mean(7, 9))])

目前的解決方案需要幾個步驟重新排序和打出來的數據，將它們結合起來，重建原始數據結構之前存儲在多層字典中的元組。是否有一種整潔和Pythonic的方式來做到這一點？

來源

2014-03-06 pehrs

項目1，2和3被固定或可以是任何東西？ –

我不禁覺得這可能更容易在數據庫中完成。 –

@Ashwini：id1，id2和id3是字符串，用於標識數據。 Xydata是[（integer，float）]的列表 – pehrs

您可以通過使用defaultdict合併：

>>> l = [('a', 'b', 'c', [(1, 2),(2, 3),(3, 4)]), 
...  ('a', 'b', 'c', [(1, 1),(2, 4),(3, 6)]), 
...  ('a', 'b', 'd', [(1, 3),(2, 6),(3, 7)]), 
...  ('a', 'b', 'd', [(1, 7),(2, 8),(3, 9)])] 

>>> d = defaultdict(lambda:defaultdict(list)) 
>>> for k1,k2,k3, lst in l: 
... for t in lst: 
... d[(k1,k2,k3)][t[0]].append(t[1])

結果：

>>> d 
defaultdict(<function <lambda> at 0x8e33e9c>, 
{('a', 'b', 'c'): defaultdict(<type 'list'>, {1: [2, 1], 2: [3, 4], 3: [4, 6]}), 
('a', 'b', 'd'): defaultdict(<type 'list'>, {1: [3, 7], 2: [6, 8], 3: [7, 9]})})

，如果你需要它在列表：

>>> [(k, v.items()) for k,v in d.items()] 
[(('a', 'b', 'c'), [(1, [2, 1]), (2, [3, 4]), (3, [4, 6])]), 
(('a', 'b', 'd'), [(1, [3, 7]), (2, [6, 8]), (3, [7, 9])])]

均值計算：

>>> [(k, [(n, sum(t)/float(len(t))) for n,t in v.items()]) for k,v in d.items()] 
[(('a', 'b', 'c'), [(1, 1.5), (2, 3.5), (3, 5.0)]), 
(('a', 'b', 'd'), [(1, 5.0), (2, 7.0), (3, 8.0)])]

來源

2014-03-06 16:30:20 ndpu

Very整潔，我不知道defaultdict，這使事情變得更容易。 – pehrs

使用itertools.groupby，izip和一些列表理解：

from itertools import groupby, izip 
from pprint import pprint 

lis = [('a', 'b', 'c', [(1, 2), (2, 3), (3, 4)]), ('a', 'b', 'c', [(1, 1), (2, 4), (3, 6)]), ('a', 'b', 'd', [(1, 3), (2, 6), (3, 7)]), ('a', 'b', 'd', [(1, 7), (2, 8), (3, 9)])] 

def solve(seq, X): 
    for k, g in groupby(seq, key=lambda x:x[:3]): 
     data = ((y[1] for y in x[3]) for x in g) 
     yield tuple(list(k) + [[(a, sum(b, 0.0)/len(b)) 
               for a, b in izip(X, izip(*data))]]) 

X = [a for a, _ in lis[0][3]] 
pprint(list(solve(lis, X)))

輸出：

[('a', 'b', 'c', [(1, 1.5), (2, 3.5), (3, 5.0)]), 
('a', 'b', 'd', [(1, 5.0), (2, 7.0), (3, 8.0)])]

來源

2014-03-06 16:30:56

合併元組的一部分，在Python

回答

相關問題