2017-08-14 153 views
1

我有大約50列的數據幀聚合multicolumns創建新列,其中有些是period_start_time,ID,speed_throughput等 數據幀樣本:通過分組和熊貓

id  period_start_time   speed_througput ... 
0 1  2017-06-14 20:00:00    6 
1 1  2017-06-14 20:00:00    10 
2 1  2017-06-14 21:00:00    2 
3 1  2017-06-14 21:00:00    5 
4 2  2017-06-14 20:00:00    8 
5 2  2017-06-14 20:00:00    12 
... 

我曾試圖去通過分組兩列(id和period_start_time)創建兩個新列,並查找speed_trhoughput的avg和min。 我已經試過代碼:

df['Throughput_avg']=df.sort_values(['period_start_time'],ascending=False).groupby(['period_start_time','id'])[['speed_trhoughput']].max() 
df['Throughput_min'] = df.groupby(['period_start_time', 'id'])[['speed_trhoughput']].min() 

正如你所看到的,有兩種方法我試過,但沒有任何工程。 我兩個嘗試收到錯誤消息:

TypeError:incompatible index of inserted column with frame index 

我想你知道我的輸出要求是,所以沒有必要將它張貼。

回答

1

選項1
使用agggroupbyjoin附加到主數據幀

df.join(
    df.groupby(['id', 'period_start_time']).speed_througput.agg(
     ['mean', 'min'] 
    ).rename(columns={'mean': 'avg'}).add_prefix('Throughput_'), 
    on=['id', 'period_start_time'] 
) 

    id period_start_time speed_througput Throughput_avg Throughput_min 
0 1 2017-06-14 20:00:00    6    8.0    6 
1 1 2017-06-14 20:00:00    10    8.0    6 
2 1 2017-06-14 21:00:00    2    3.5    2 
3 1 2017-06-14 21:00:00    5    3.5    2 
4 2 2017-06-14 20:00:00    8   10.0    8 
5 2 2017-06-14 20:00:00    12   10.0    8 

選項2
使用transformgroupby上下文並使用assign添加新列

g = df.groupby(['id', 'period_start_time']).speed_througput.transform 
df.assign(Throughput_avg=g('mean'), Throughput_min=g('min')) 

    id period_start_time speed_througput Throughput_avg Throughput_min 
0 1 2017-06-14 20:00:00    6    8.0    6 
1 1 2017-06-14 20:00:00    10    8.0    6 
2 1 2017-06-14 21:00:00    2    3.5    2 
3 1 2017-06-14 21:00:00    5    3.5    2 
4 2 2017-06-14 20:00:00    8   10.0    8 
5 2 2017-06-14 20:00:00    12   10.0    8