2013-05-27 196 views
26

我想的DeltaT的列添加到數據幀其中DeltaT的連續行之間的時間差(如時間序列索引)。計算時間差指數

time     value 

2012-03-16 23:50:00  1 
2012-03-16 23:56:00  2 
2012-03-17 00:08:00  3 
2012-03-17 00:10:00  4 
2012-03-17 00:12:00  5 
2012-03-17 00:20:00  6 
2012-03-20 00:43:00  7 

所需的結果是一樣的東西以下(DeltaT的單位以分鐘爲單位示出):這是使用numpy的> = 1.7

time     value deltaT 

2012-03-16 23:50:00  1  0 
2012-03-16 23:56:00  2  6 
2012-03-17 00:08:00  3  12 
2012-03-17 00:10:00  4  2 
2012-03-17 00:12:00  5  2 
2012-03-17 00:20:00  6  8 
2012-03-20 00:43:00  7  23 
+1

看看這裏的一些類似的問題和timedelta文檔:http://pandas.pydata.org/pandas-docs/dev/cookbook.html#miscellaneous在最後部分代碼 – Jeff

回答

42

注意,對於numpy的< 1.7,請參見此處的轉化:http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas

您原來的框架,以日期時間指數

In [196]: df 
Out[196]: 
        value 
2012-03-16 23:50:00  1 
2012-03-16 23:56:00  2 
2012-03-17 00:08:00  3 
2012-03-17 00:10:00  4 
2012-03-17 00:12:00  5 
2012-03-17 00:20:00  6 
2012-03-20 00:43:00  7 

In [199]: df.index 
Out[199]: 
<class 'pandas.tseries.index.DatetimeIndex'> 
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00] 
Length: 7, Freq: None, Timezone: None 

這是你想要

In [200]: df['tvalue'] = df.index 

In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0) 

In [202]: df 
Out[202]: 
        value    tvalue   delta 
2012-03-16 23:50:00  1 2012-03-16 23:50:00   00:00:00 
2012-03-16 23:56:00  2 2012-03-16 23:56:00   00:06:00 
2012-03-17 00:08:00  3 2012-03-17 00:08:00   00:12:00 
2012-03-17 00:10:00  4 2012-03-17 00:10:00   00:02:00 
2012-03-17 00:12:00  5 2012-03-17 00:12:00   00:02:00 
2012-03-17 00:20:00  6 2012-03-17 00:20:00   00:08:00 
2012-03-20 00:43:00  7 2012-03-20 00:43:00 3 days, 00:23:00 

走出答案而忽視一天有什麼區別的timedelta64(你的最後一天是3/20,之前是3/17),實際上是棘手

In [204]: df['ans'] = df['delta'].apply(lambda x: x/np.timedelta64(1,'m')).astype('int64') % (24*60) 

In [205]: df 
Out[205]: 
        value    tvalue   delta ans 
2012-03-16 23:50:00  1 2012-03-16 23:50:00   00:00:00 0 
2012-03-16 23:56:00  2 2012-03-16 23:56:00   00:06:00 6 
2012-03-17 00:08:00  3 2012-03-17 00:08:00   00:12:00 12 
2012-03-17 00:10:00  4 2012-03-17 00:10:00   00:02:00 2 
2012-03-17 00:12:00  5 2012-03-17 00:12:00   00:02:00 2 
2012-03-17 00:20:00  6 2012-03-17 00:20:00   00:08:00 8 
2012-03-20 00:43:00  7 2012-03-20 00:43:00 3 days, 00:23:00 23 
12

我們可以創建具有同時使用的索引和值等於所述索引關鍵字to_series,然後計算這將導致timedelta64[ns] D型連續行之間的差的一個系列。獲得此,經由.dt屬性後,我們可以訪問的時間部分的秒屬性和最後除以60的每個元素,以得到它在幾分鐘輸出(任選被0-4填充第一值)。

In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0) 
    ...: df         # use .astype(int) to obtain integer values 
Out[13]: 
        value deltaT 
time        
2012-03-16 23:50:00  1  0.0 
2012-03-16 23:56:00  2  6.0 
2012-03-17 00:08:00  3 12.0 
2012-03-17 00:10:00  4  2.0 
2012-03-17 00:12:00  5  2.0 
2012-03-17 00:20:00  6  8.0 
2012-03-20 00:43:00  7 23.0 

簡化:

當我們執行diff

In [8]: ser_diff = df.index.to_series().diff() 

In [9]: ser_diff 
Out[9]: 
time 
2012-03-16 23:50:00    NaT 
2012-03-16 23:56:00 0 days 00:06:00 
2012-03-17 00:08:00 0 days 00:12:00 
2012-03-17 00:10:00 0 days 00:02:00 
2012-03-17 00:12:00 0 days 00:02:00 
2012-03-17 00:20:00 0 days 00:08:00 
2012-03-20 00:43:00 3 days 00:23:00 
Name: time, dtype: timedelta64[ns] 

幾秒到幾分鐘的轉換:

In [10]: ser_diff.dt.seconds.div(60, fill_value=0) 
Out[10]: 
time 
2012-03-16 23:50:00  0.0 
2012-03-16 23:56:00  6.0 
2012-03-17 00:08:00 12.0 
2012-03-17 00:10:00  2.0 
2012-03-17 00:12:00  2.0 
2012-03-17 00:20:00  8.0 
2012-03-20 00:43:00 23.0 
Name: time, dtype: float64 

如果假設要包括連date部分,因爲它是以前被排除在外(僅時間部分被認爲),dt.total_seconds會給你經過的持續時間與分鐘然後可以再次被劃分計算秒。

In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0) 
Out[12]: 
time 
2012-03-16 23:50:00  0.0 
2012-03-16 23:56:00  6.0 
2012-03-17 00:08:00  12.0 
2012-03-17 00:10:00  2.0 
2012-03-17 00:12:00  2.0 
2012-03-17 00:20:00  8.0 
2012-03-20 00:43:00 4343.0 # <-- number of minutes in 3 days 23 minutes 
Name: time, dtype: float64 
+1

最新的留言應該說「 23分鐘「 – Corrumpo

+0

哦,是的。感謝您指出了這一點。 –