Python的 - GROUP BY和總結鑑於以下列表元組

的列表：Python的 - GROUP BY和總結鑑於以下列表元組

[ 
    ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)), 
    ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)), 
    ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)), 
    ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)), 
    ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0)) 
]

我想第一，第二，第四和第五的元組列這些組，總結了第三位。對於這個例子，我將列命名爲col1，col2，col3，col4，col5。

在SQL我會做這樣的事情：

select col1, col2, sum(col3), col4, col5 from my table 
group by col1, col2, col4, col5

是否有一個「酷」的方式來做到這一點還是所有手動循環？

來源

2012-06-15 jbassking10

>>> [(x[0:2] + (sum(z[2] for z in y),) + x[2:5]) for (x, y) in 
     itertools.groupby(sorted(L, key=operator.itemgetter(0, 1, 3, 4)), 
     key=operator.itemgetter(0, 1, 3, 4))] 
[ 
    ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0)), 
    ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)), 
    ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)), 
    ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)), 
    ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)) 
]

（注：輸出格式化）

來源

2012-06-15 20:56:39

這個效果更好 - 謝謝！ – jbassking10

你想要itertools.groupby。

注意groupby預計輸入進行排序，所以你可能需要做的前手：

keyfunc = lambda t: (t[0], t[1], t[3], t[4]) 
data.sort(key=keyfunc) 
for key, rows in itertools.groupby(data, keyfunc): 
    print key, sum(r[2] for r in rows)

來源

2012-06-15 20:54:17

'operator.itemgetter（0,1,3,4）' – JBernardo

謝謝 - 完美的作品！ – jbassking10

如果你發現自己這樣做有很多大數據集，你可能想看pandas這個庫，它有很多好的工具可以做這種事情。

來源

2012-06-15 21:16:43 BrenBarn

Python的 - GROUP BY和總結鑑於以下列表元組

回答

相關問題