Concanecate兩列,將ID
和使用pivot
:
df['ID'] = df['eng_id'].astype(str) + ',' + df['date']
df = df.pivot(index='ID', columns='equipment_id', values='measurement').fillna(0).astype(int)
print (df)
equipment_id 100 200 300 400 500 600
ID
1,2016-01 20 46 18 0 0 0
1,2016-04 0 33 0 0 0 0
1,2016-05 0 27 0 0 0 0
2,2016-01 0 0 9 15 0 0
2,2016-05 0 0 0 65 51 16
類似的解決方案與set_index
+ unstack
:
df['ID'] = df['eng_id'].astype(str) + ',' + df['date']
df = df.set_index(['ID', 'equipment_id'])['measurement'].unstack(fill_value=0)
print (df)
equipment_id 100 200 300 400 500 600
ID
1,2016-01 20 46 18 0 0 0
1,2016-04 0 33 0 0 0 0
1,2016-05 0 27 0 0 0 0
2,2016-01 0 0 9 15 0 0
2,2016-05 0 0 0 65 51 16
但如果需要2
列ID
:
df = df.set_index(['eng_id', 'date', 'equipment_id'])['measurement'].unstack(fill_value=0)
print (df)
equipment_id 100 200 300 400 500 600
eng_id date
1 2016-01 20 46 18 0 0 0
2016-04 0 33 0 0 0 0
2016-05 0 27 0 0 0 0
2 2016-01 0 0 9 15 0 0
2016-05 0 0 0 65 51 16
對於列添加reset_index
+ rename_axis
:
df = df.set_index(['eng_id', 'date', 'equipment_id'])['measurement'].unstack(fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
eng_id date 100 200 300 400 500 600
0 1 2016-01 20 46 18 0 0 0
1 1 2016-04 0 33 0 0 0 0
2 1 2016-05 0 27 0 0 0 0
3 2 2016-01 0 0 9 15 0 0
4 2 2016-05 0 0 0 65 51 16
但如果得到:
ValueError: Index contains duplicate entries, cannot reshape
它意味着你有重複,需要pivot_table
與像mean
一些聚合函數,sum
...:
print (df)
eng_id date equipment_id measurement
0 1 2016-01 100 20 <-duplicate 1 2016-01 100
1 1 2016-01 100 30 <-duplicate 1 2016-01 100
2 1 2016-01 200 46
3 1 2016-01 300 18
4 1 2016-04 200 33
5 1 2016-05 200 27
6 2 2016-01 300 9
7 2 2016-01 400 15
8 2 2016-05 400 65
9 2 2016-05 500 51
10 2 2016-05 600 16
df['ID'] = df['eng_id'].astype(str) + ',' + df['date']
df = df.pivot_table(index='ID',
columns='equipment_id',
values='measurement',
fill_value=0,
aggfunc='mean')
print (df)
equipment_id 100 200 300 400 500 600
ID
1,2016-01 25 46 18 0 0 0 <= (20+30)/2=25
1,2016-04 0 33 0 0 0 0
1,2016-05 0 27 0 0 0 0
2,2016-01 0 0 9 15 0 0
2,2016-05 0 0 0 65 51 16
或者使用groupby
+ aggregate function
+ unstack
:
df['ID'] = df['eng_id'].astype(str) + ',' + df['date']
df = df.groupby(['ID', 'equipment_id'])['measurement'].mean().unstack(fill_value=0)
print (df)
equipment_id 100 200 300 400 500 600
ID
1,2016-01 25 46 18 0 0 0 <= (20+30)/2=25
1,2016-04 0 33 0 0 0 0
1,2016-05 0 27 0 0 0 0
2,2016-01 0 0 9 15 0 0
2,2016-05 0 0 0 65 51 16