我想你需要to_datetime
和sort_values
第一:
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m')
df = df.sort_values(['ID','Date'])
print (df)
ID Date Highlight
0 1 2015-01-01 B
2 1 2015-07-01 A
1 2 2015-06-01 C
4 2 2015-09-01 A
6 3 2015-01-01 B
3 3 2015-08-01 D
5 3 2015-10-01 B
然後用參數sort
groupby
,因爲默認的排序是沒有必要與apply
:
... list
的名單列
df1 = df.groupby('ID', sort=False)['Highlight'] \
.apply(list) \
.reset_index(name='Highlight Sequence') \
print (df1)
ID Highlight Sequence
0 1 [B, A]
2 2 [C, A]
1 3 [B, D, B]
... join
for string
專欄:
df2 = df.groupby('ID', sort=False)['Highlight']
.apply(','.join)
.reset_index(name='Highlight Sequence')
print (df2)
ID Highlight Sequence
0 1 B,A
1 2 C,A
2 3 B,D,B
但是,如果通過(date
欄預設排序或不重要)行的位置需要順序:
df2 = df.groupby('ID', sort=False)['Highlight'] \
.apply(list) \
.reset_index(name='Highlight Sequence')
print (df2)
ID Highlight Sequence
0 1 [B, A]
1 2 [C, A]
2 3 [D, B, B]