比較不同數據幀的groupby輸出

什麼可能是一種方法來比較多個groupby輸出。比較不同數據幀的groupby輸出

我有不同dataframes多個GROUPBY輸出，像下面

>>> tmp1 
    account place balance type 
0  A A1  10 B1 
1  A A1  20 B1 
2  A A1  30 B1 
3  A A1  10 B4 
4  A A1  20 B4 
5  A A1  10 B5 
6  A A1  10 B6 
7  B A2  10 B7 
8  B A2  20 B1 
9  B A2  100 B1

我做

>>>tmp1.groupby(['account','place','type']['balance'].last().sum(level=0).astype(int) 
    account 
    A  70 
    B  110 
    Name: balance, dtype: int64 

Similarly 
>>> tmp2 
    account place balance type 
0  A A1  100 B1 
1  A A1  200 B1 
2  A A1  100 B1 
3  A A1  100 B4 
4  A A1  200 B4 
5  A A1  100 B5 
6  A A1  100 B6 
7  B A2  100 B7 
8  B A2  200 B1 
9  B A2  200 B1 


    >>>tmp2.groupby(['account','place','type']['balance'].last().sum(level=0).astype(int) 
    account 
    A  500 
    B  300 
    Name: balance, dtype: int64 

    #similarly tmp3 grouped..and so on

有沒有辦法找到與最大總和平衡DF。例如。在這種情況下，tmp2具有更大的總和(70+110 < 500+300)。

我嘗試：一個我試圖正在採取的總和，並保持一個列表，像下面

mylist=[] 
mylist.append(tmp1.groupby(['account','place','type']['balance'].last().sum(.astype(int)) 
mylist.append(tmp2.groupby(['account','place','type']['balance'].last().sum(.astype(int)) 
>>> mylist 
[180,800]

現在我可以從列表中取最大值的方式，但我失去帳戶信息（800是最高，但我需要在具有500賬戶A，B爲300）

我試圖

>>>tmp2.groupby(['account','place','type'])['balance'].last().sum(level=0).to_dict() 
{'A': 500, 'B': 300}

因此，對於每一個DF我有一個字典的信息，我只需要找到最大的此類名單（我想我已經非常接近解決它）

我打算找該數據幀有最高金額（含賬戶一起）

來源

2017-07-18 pythonRcpp

如果我理解正確的話，如果你有2點以上的DFS。

tmp1 = pd.DataFrame([{'acount':'A', 'balance':100, 'type':'A1'}, 
       {'acount':'A', 'balance':200, 'type':'A2'}, 
       {'acount':'B', 'balance':200, 'type':'B1'}, 
       {'acount':'B', 'balance':300, 'type':'B2'}]) 
tmp2 = pd.DataFrame([{'acount':'A', 'balance':100, 'type':'A1'}, 
       {'acount':'A', 'balance':200, 'type':'A2'}, 
       {'acount':'B', 'balance':400, 'type':'B1'}, 
       {'acount':'B', 'balance':300, 'type':'B2'}]) 
tmplist = [tmp1,tmp2] 
tmprlist = [tmp.groupby(['acount','type']).last().sum(level=0).astype(int) for tmp in tmplist] 
tmpslist = [tmp.groupby(['acount','type'])['balance'].last().sum() for tmp in tmplist] 
tmprlist[np.argmax(tmpslist)]

結果：

acount balance 
A  300 
B  700

來源

2017-07-18 08:47:46

這樣做對我來說！你可以給命令添加備註嗎？ – pythonRcpp

比較不同數據幀的groupby輸出

回答

相關問題