2017-06-23 122 views
2

喜列的列表,所以我有以下數據框:累積使用GROUPBY

Fruit metric 
0 Apple  NaN 
1 Apple 100.0 
2 Apple  NaN 
3 Peach 70.0 
4 Pear 120.0 
5 Pear 100.0 
6 Pear  NaN 

我的目標是GROUPBY水果和順序,添加的metric每個值不爲空的累積列表,其像這樣自己單獨列:

Fruit metric metric_cum 
0 Apple  NaN   [] 
1 Apple 100.0  [100] 
2 Apple  NaN  [100] 
3 Peach 70.0  [70] 
4 Pear 120.0  [120] 
5 Pear 100.0 [120, 100] 
6 Pear  NaN [120, 100] 

我也試着這樣做:

df['metric1'] = df['metric'].astype(str) 
df.groupby('Fruit')['metric1'].cumsum() 

但這結果爲DataError: No numeric types to aggregate

我也試着這樣做:

df.groupby('Fruit')['metric'].apply(list) 

結果造成:

Fruit 
Apple  [nan, 100.0, nan] 
Peach     [70.0] 
Pear  [120.0, 100.0, nan] 
Name: metric, dtype: object 

但這不是累積性的,不能製作成一列。 感謝您的幫助

回答

4

用途:

df['metric'] = df['metric'].apply(lambda x: [] if pd.isnull(x) else [int(x)]) 
df['metric_cum'] = df.groupby('Fruit')['metric'].apply(lambda x: x.cumsum()) 
print (df) 
    Fruit metric metric_cum 
0 Apple  []   [] 
1 Apple [100]  [100] 
2 Apple  []  [100] 
3 Peach [70]  [70] 
4 Pear [120]  [120] 
5 Pear [100] [120, 100] 
6 Pear  [] [120, 100] 

或者:

a = df['metric'].apply(lambda x: [] if pd.isnull(x) else [int(x)]) 
df['metric_cum'] = a.groupby(df['Fruit']).apply(lambda x: x.cumsum()) 
print (df) 
    Fruit metric metric_cum 
0 Apple  NaN   [] 
1 Apple 100.0  [100] 
2 Apple  NaN  [100] 
3 Peach 70.0  [70] 
4 Pear 120.0  [120] 
5 Pear 100.0 [120, 100] 
6 Pear  NaN [120, 100] 
2
f = lambda x: pd.Series(x).dropna().astype(int).tolist() 
c = pd.Series.cumsum 
df.assign(metric_cum=df.metric.apply(f).groupby(df.Fruit).apply(c)) 

    Fruit metric metric_cum 
0 Apple  NaN   [] 
1 Apple 100.0  [100] 
2 Apple  NaN  [100] 
3 Peach 70.0  [70] 
4 Pear 120.0  [120] 
5 Pear 100.0 [120, 100] 
6 Pear  NaN [120, 100]