2017-05-16 148 views
1

我想添加一列到一個多索引的Pandas GroupBy DataFrame。該列是分組後的公共密鑰的最大值和平均值之間的差值。Python Pandas將列添加到多索引GroupBy DataFrame

下面是輸入數據幀:

Main Reads Test Subgroup 
0  1  5 54   1 
1  2  2 55   1 
2  1  10 56   2 
3  2  20 57   3 
4  1  7 58   3 

下面是代碼:

import numpy as np 
import pandas as pd 

df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\ 
        'Test':range(54,59), 'Subgroup':[1,1,2,3,3]}) 

result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]}) 

這裏是可變result之前執行的diff計算:

   Reads  
       amax mean 
Main Subgroup   
1 1   5 5 
    2   10 10 
    3   7 7 
2 1   2 2 
    3   20 20 

接着,我計算diff列:

result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean'] 

但這裏是輸出:

/home/userd/test.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame. 
Try using .loc[row_indexer,col_indexer] = value instead 

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/ 
...stable/indexing.html#indexing-view-versus-copy 
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean'] 

我想diff列在同一水平amaxmean

有沒有辦法將一列添加到Pandas中多索引GroupBy()對象的最內層(底部)列索引?

回答

2
#you can you lambda to build diff directly. 
df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean,lambda x: np.max(x)-np.mean(x)]}).rename(columns={'<lambda>':'diff'}) 
Out[2360]: 
       Reads   
       amax mean diff 
Main Subgroup     
1 1   5 5 0 
    2   10 10 0 
    3   7 7 0 
2 1   2 2 0 
    3   20 20 0 
+0

謝謝!這很有幫助。 –

+0

你能解釋嵌入式'list'裏面的'lambda'嗎?我不確定我是否明白爲什麼它需要在那裏。 –

+0

lambda只是您應用於'Reads'的另一個函數,就像您放在那裏的np.max和np.mean函數一樣。這意味着字典中的所有功能都將應用於「讀取」,從而創建3個獨立的列。 – Allen

2

試試這個:

In [8]: result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean, lambda x: x.max()-x.mean()]}) 

In [9]: result 
Out[9]: 
       Reads 
       amax mean <lambda> 
Main Subgroup 
1 1   5 5  0 
    2   10 10  0 
    3   7 7  0 
2 1   2 2  0 
    3   20 20  0 

In [10]: result = result.rename(columns={'<lambda>':'diff'}) 

In [11]: result 
Out[11]: 
       Reads 
       amax mean diff 
Main Subgroup 
1 1   5 5 0 
    2   10 10 0 
    3   7 7 0 
2 1   2 2 0 
    3   20 20 0 
+0

加一個像往常一樣進入多指標:) – Vaishali

3

您可以使用一個元組

result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')] 

你得到

    Reads 
        amax mean diff 
Main Subgroup    
1  1   5  5  0 
     2   10  10  0 
     3   7  7  0 
2  1   2  2  0 
     3   20  20  0 
+0

只是輝煌! –

相關問題