0
我想使用時間日期時間作爲主索引,但在那裏有很多重複項。我想要的是在每組秒內添加人工毫秒,用作「計數器」。通過添加毫秒去重複時間索引
例如 - 原始數據框的樣子:
Bid BidVol
2016-06-27 13:00:10 4183.50 0
2016-06-27 13:00:10 4183.50 0
2016-06-27 13:00:10 4183.50 0
2016-06-28 13:00:10 4249.25 1
2016-06-28 13:00:10 4249.25 1
2016-06-28 13:00:10 4249.00 1
2016-06-28 13:00:10 4248.75 1
2016-06-28 13:00:10 4248.75 2
2016-06-28 13:00:10 4248.75 1
2016-06-28 13:00:10 4248.75 2
2016-06-28 13:00:12 4248.50 0
2016-06-28 13:00:12 4248.50 0
2016-06-29 13:00:12 4353.75 0
2016-06-29 13:00:12 4353.75 0
2016-06-29 13:00:12 4353.75 0
2016-06-29 13:00:12 4354.00 1
2016-06-29 13:00:12 4354.00 1
2016-06-29 13:00:12 4353.75 0
2016-06-29 13:00:12 4354.00 1
2016-06-29 13:00:12 4354.00 1
2016-06-29 13:00:12 4354.00 1
2016-06-29 13:00:12 4354.00 1
2016-06-30 13:00:10 4394.00 0
2016-06-30 13:00:11 4394.25 1
2016-06-30 13:00:11 4394.00 0
我的目標是改變duplicit行:
2016-06-28 13:00:10
2016-06-28 13:00:10.001000
2016-06-28 13:00:10.002000
2016-06-28 13:00:10.003000
2016-06-28 13:00:10.004000
2016-06-28 13:00:10.005000
2016-06-28 13:00:10.006000
我試圖用GROUPBY功能的發揮,我可以用它來打印循環的毫秒數:
for name, group in test.groupby(test.index):
print ('------')
i=0
for idx, values in group.iterrows():
print (idx+pd.Timedelta(milliseconds=i))
i+=1
但是我不知道如何改變索引最有效的方法來獲得我需要的結果?特別是考慮到效率(主數據集非常大)。