2016-05-31 136 views
2

我有一個dataframe,並且想要減去前一行的兩列,前提是前一行的值爲相同的Name。如果沒有,那麼我希望它產生NAN並填寫-。我的groupby表達式產生錯誤,TypeError: 'Series' objects are mutable, thus they cannot be hashed,這是非常模糊的。我錯過了什麼?在熊貓中用Groupby減去兩列

import pandas as pd 
df = pd.DataFrame(data=[['Person A', 5, 8], ['Person A', 13, 11], ['Person B', 11, 32], ['Person B', 15, 20]], columns=['Names', 'Value', 'Value1']) 
df['diff'] = df.groupby('Names').apply(df['Value'].shift(1) - df['Value1'].shift(1)).fillna('-') 
print df 

所需的輸出:

 Names Value Value1 diff 
0 Person A  5  8  - 
1 Person A  13  11 -3 
2 Person B  11  32  - 
3 Person B  15  20 -21 

回答

2

您可以添加lambda x和更改df['Value']x['Value'],類似與Value1和最後reset_index

df['diff'] = df.groupby('Names') 
       .apply(lambda x: x['Value'].shift(1) - x['Value1'].shift(1)) 
       .fillna('-') 
       .reset_index(drop=True) 
print (df) 
     Names Value Value1 diff 
0 Person A  5  8 - 
1 Person A  13  11 -3 
2 Person B  11  32 - 
3 Person B  15  20 -21 

DataFrameGroupBy.shift另一種解決方案:

df1 = df.groupby('Names')['Value','Value1'].shift() 
print (df1) 
    Value Value1 
0 NaN  NaN 
1 5.0  8.0 
2 NaN  NaN 
3 11.0 32.0 
df['diff'] = (df1.Value - df1.Value1).fillna('-') 

print (df) 
     Names Value Value1 diff 
0 Person A  5  8 - 
1 Person A  13  11 -3 
2 Person B  11  32 - 
3 Person B  15  20 -21 
1

你也可以這樣來做:

In [76]: df['diff'] = (-df.groupby('Names')[['Value1','Value']].shift(1).diff(axis=1)['Value1']).fillna(0) 

In [77]: df 
Out[77]: 
     Names Value Value1 diff 
0 Person A  5  8 0.0 
1 Person A  13  11 -3.0 
2 Person B  11  32 0.0 
3 Person B  15  20 -21.0