python pandas覆蓋時間戳

我有一個數據集，格式如下。python pandas覆蓋時間戳

420,426,2013-04-28T23:59:21,7,20 
421,427,2013-04-28T23:59:21,13,12 
422,428,2013-04-28T23:59:22,10,16 
423,510,2013-04-28T23:59:22,0,1 
424,511,2013-04-28T23:59:22,9,0 
425,1,2013-04-29T00:04:21,19,5 
426,2,2013-04-29T00:04:21,25,1 
427,3,2013-04-29T00:04:22,14,7 
428,4,2013-04-29T00:04:22,18,2

我使用熊貓，我們正與一個巨大的數據集工作。我想要將數據分成5分鐘的時間間隔。我正在使用以下代碼來獲取組。

有沒有辦法以有效的方式將原始數據集中的時間戳替換爲新組的時間戳？例如， ;在這個例子中，我們希望前五個實例加蓋相同的時間戳，這是適當組的時間戳。

import pandas as pd 

from datetime import timedelta 

from pandas.tseries.resample import TimeGrouper 
file_name = os.path.join("..", "..", "Dataset", "all_rawdata.csv") 

dataset=pd.read_csv(file_name,dtype{"ID":np.int32,"station":np.int32,"time":str,"slots":np.int32,"available":np.int32}) 

dataset['time'] =pd.to_datetime(dataset['time']) 
dataset.set_index(dataset.time, inplace=True) 


data1 = dataset.groupby(TimeGrouper('5Min'))

來源

2014-11-15 user3103646

你可以張貼一些示例輸出？目前還不清楚你想要什麼，但聽起來你只需要每個5分鐘數據組中的前5個實例。是對的嗎？ –

使用的GroupBy對象的.transform方法：

import pandas 
import numpy 

dtindex = pandas.DatetimeIndex(
    start='2012-01-01', 
    end='2012-01-15', 
    freq='10s' 
) 

df = pandas.DataFrame(
    data=numpy.random.normal(size=(len(dtindex), 2)), 
    index=dtindex, 
    columns=['A', 'B'] 
) 
groups_5min = df.groupby(pandas.TimeGrouper('5Min')) 
first_5_of_everything = groups_5min.transform(lambda g: g.head(5)) 
print(first_5_of_everything.head(20)) 


          A   B 
2012-01-01 00:00:00 1.596596 0.523592 
2012-01-01 00:00:10 -0.922953 0.496072 
2012-01-01 00:00:20 0.307187 -1.336588 
2012-01-01 00:00:30 1.063472 0.700835 
2012-01-01 00:00:40 0.818054 -2.150868 
2012-01-01 00:05:00 -1.457456 0.239977 # <--- jumps ahead 
2012-01-01 00:05:10 -0.918154 1.391162 
2012-01-01 00:05:20 0.032661 0.197498 
2012-01-01 00:05:30 -1.788646 -0.539537 
2012-01-01 00:05:40 -0.147163 0.953631 
2012-01-01 00:10:00 0.226996 -0.327286 # <--- jumps ahead 
2012-01-01 00:10:10 -0.514218 0.053867 
2012-01-01 00:10:20 -0.627977 -1.370492 
2012-01-01 00:10:30 -0.217245 -0.979994 
2012-01-01 00:10:40 -0.164559 0.799679 
2012-01-01 00:15:00 0.155583 -1.489055 # <--- jumps ahead 
2012-01-01 00:15:10 -1.557037 -1.285676 
2012-01-01 00:15:20 0.555650 0.223248 
2012-01-01 00:15:30 -0.619089 0.954938 
2012-01-01 00:15:40 0.371026 2.906548

來源

2014-11-16 03:23:30

python pandas覆蓋時間戳

回答

相關問題