熊貓數據幀GROUPBY多個列再總結

假設爲每個Python代碼如下：熊貓數據幀GROUPBY多個列再總結

import pandas as pd 
import numpy as np

在熊貓，如果我有2列，其中之一是數字數組的一個數據幀，我可以總結以上數組的值來獲得單個數組。

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'], 'numbers' : [np.array([1, 2, 3, 4]),np.array([2, 4, 2, 4]),np.array([2, 3, 4, 5]),np.array([1, 3, 5, 7])]}) 
df['arrays'].sum()

我甚至可以通過第一列組，然後在第二列總和來獲得每個組和：

grpA = df.groupby('A') 
grpA.sum()

但是，如果我有除了陣列列多等欄目，說其他2列，然後我得到一個ValueError: Function does not reduce由前兩列求和陣列列設法組時：

df2 = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],'B': ['la', 'la', 'al', 'al'],'numbers' : [np.array([1, 2, 3, 4]),np.array([2, 4, 2, 4]),np.array([2, 3, 4, 5]),np.array([1, 3, 5, 7])]}) 
grpAB = df2.groupby(['A','B']) 
grpAB.sum()

在SQL，下面的工作，如果我可以總結了數組：

select A, B, sum(numbers) 
    from df2 
    group by A, B

有沒有辦法通過多個列和求和成功組在大熊貓中最後一個數組列？

來源

2015-09-02 intdt

一種可能的解決方案是

df2 = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],'B': ['la', 'la', 'al', 'al'],'numbers' : [np.array([1, 2, 3, 4]),np.array([2, 4, 2, 4]),np.array([2, 3, 4, 5]),np.array([1, 3, 5, 7])]}) 

grouped = df2.groupby(['A','B']) 

#set up empty arrays to append data from below loop 
array=[] 
index=[] 

#loop through the grouped data and sum up the array numbers 
for i,j in grouped: 
    array.append({'numbers':j.numbers.sum()}) 
    index.append(i) 

#put summed array back into a dataframe 
print pd.DataFrame((array),index=index)

來源

2015-09-02 21:29:44

你好Tom，它看起來並不像這樣的作品。它只輸出一個數組，相當於df2 ['array']。sum（）。但是你給了我一個關於申請的想法。讓我看看我能不能找出一些東西。 – intdt

嗨，對不起，我誤解了這個問題 - 我編輯了答案，應該接近你正在尋找的東西。 –

可以使用lambda表達。 iat表達式採用Series中第一個元素的標量值（這裏只是數字列表），然後求和結果。

>>> df2.groupby(['A', 'B']).numbers.apply(lambda x: x.iat[0].sum()) 

A B 
bar al 16 
    la 12 
foo al 14 
    la 10 
Name: numbers, dtype: int64

來源

2015-09-03 00:04:33 Alexander

df2 = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],'B': ['la', 'la', 'al', 'al'],'numbers' : [np.array([1, 2, 3, 4]),np.array([2, 4, 2, 4]),np.array([2, 3, 4, 5]),np.array([1, 3, 5, 7])]}) 


Out[42]: 
    A B numbers 
0 foo la [1, 2, 3, 4] 
1 bar la [2, 4, 2, 4] 
2 foo al [2, 3, 4, 5] 
3 bar al [1, 3, 5, 7] 

grpAB = df2.groupby(['A','B']) 
res = grpAB.apply(lambda x : x.numbers.sum()) 


Out[43]: 
A B 
bar al [1, 3, 5, 7] 
    la [2, 4, 2, 4] 
foo al [2, 3, 4, 5] 
    la [1, 2, 3, 4] 
dtype: object 

pd.DataFrame(res , columns = ['numbers']) 


Out[44]: 
numbers 
A B 
bar al [1, 3, 5, 7] 
    la [2, 4, 2, 4] 
foo al [2, 3, 4, 5] 
    la [1, 2, 3, 4] 
# if you want to reset the index 
pd.DataFrame(res , columns = ['numbers']).reset_index() 


Out[45]: 
    A B numbers 
0 bar al [1, 3, 5, 7] 
1 bar la [2, 4, 2, 4] 
2 foo al [2, 3, 4, 5] 
3 foo la [1, 2, 3, 4]

來源

2015-09-03 06:01:10

熊貓數據幀GROUPBY多個列再總結

回答

相關問題