初始數據框:
dt user val
0 2016-01-01 a 1
1 2016-01-02 a 33
2 2016-01-05 b 2
3 2016-01-06 b 1
首先,轉換日期日期時間:
x['dt'] = pd.to_datetime(x['dt'])
然後,生成日期和獨特的用戶:
dates = x.set_index('dt').resample('D').asfreq().index
>> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', name='dt', freq='D')
users = x['user'].unique()
>> array(['a', 'b'], dtype=object)
這將允許你創建一個MultiIndex:
idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])
>> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['dt', 'user'])
你可以用它來重新索引您的數據幀:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
Out:
dt user val
0 2016-01-01 a 1
1 2016-01-01 b 0
2 2016-01-02 a 33
3 2016-01-02 b 0
4 2016-01-03 a 0
5 2016-01-03 b 0
6 2016-01-04 a 0
7 2016-01-04 b 0
8 2016-01-05 a 0
9 2016-01-05 b 2
10 2016-01-06 a 0
11 2016-01-06 b 1
然後可以由用戶進行排序:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user')
Out:
dt user val
0 2016-01-01 a 1
2 2016-01-02 a 33
4 2016-01-03 a 0
6 2016-01-04 a 0
8 2016-01-05 a 0
10 2016-01-06 a 0
1 2016-01-01 b 0
3 2016-01-02 b 0
5 2016-01-03 b 0
7 2016-01-04 b 0
9 2016-01-05 b 2
11 2016-01-06 b 1
這是有效的。謝謝。 – broccoli