2015-04-27 33 views
2

我正在使用python來計算兩個事件之間的時間間隔。每個事件都有一個「開始時間」和一個「結束時間」。我在新的專欄'間隔'中找到了兩者之間的差異,但是當開始和結束時間在不同日期時具有負值(例如開始23:46:00和結束00:21:00給出-23 :25:00)。我想創建一個if語句來運行「間隔」列,並將24小時添加到任何負值。不過,我在將24小時添加到「間隔」值時遇到了問題。目前我的'間隔'dtype = timedelta64 [ns]。在python中添加24小時以消極時差

下面是表的一點點澄清問題:

 CallDate  BeginningTime  EndingTime   Interval 
    75 1/8/2009 1900-01-01 07:49:00 1900-01-01 08:19:00  00:30:00 
    76 1/11/2009 1900-01-01 14:37:00 1900-01-01 14:59:00  00:22:00 
    77 1/9/2009 1900-01-01 09:29:00 1900-01-01 09:56:00  00:27:00 
    78 1/11/2009 1900-01-01 09:20:00 1900-01-01 10:13:00  00:53:00 
    79 1/16/2009 1900-01-01 15:11:00 1900-01-01 15:50:00  00:39:00 
    80 1/17/2009 1900-01-01 22:52:00 1900-01-01 23:26:00  00:34:00 
    81 1/19/2009 1900-01-01 05:48:00 1900-01-01 06:32:00  00:44:00 
    82 1/20/2009 1900-01-01 23:46:00 1900-01-01 00:21:00  -23:25:00 
    83 1/20/2009 1900-01-01 21:29:00 1900-01-01 22:08:00  00:39:00 
    84 1/23/2009 1900-01-01 07:33:00 1900-01-01 07:55:00  00:22:00 
    85 1/30/2009 1900-01-01 19:33:00 1900-01-01 20:01:00  00:28:00 

更新:這裏是他引導我到這個地步

df['BeginningTime']=pd.to_datetime(df['BeginningTime'], format='%H:%M') 
    df['EndingTime']=pd.to_datetime(df['EndingTime'], format='%H:%M') 

    df['Interval']=df['EndingTime']-df['BeginningTime'] 

    df[['CallDate','BeginningTime','EndingTime','Interval']] 
+1

粘貼代碼。 – bosnjak

回答

2

如果你只是想添加代碼1天的timedelta如果是負數:

df['Interval']=df['Interval'].apply(lambda x: x + Timedelta(days=1) if x < 0 else x) 

如果可以假定結束時間在24小時內是安全的,則可以檢查結束時間是否早於開始時間,並使用timedelta將一天添加到結束時間而不是間隔時間。

from datetime import datetime, timedelta 

d1 = datetime.strptime("1900-01-01 23:46:00", "%Y-%m-%d %H:%M:%S") 
d2 = datetime.strptime("1900-01-01 00:21:00", "%Y-%m-%d %H:%M:%S") 

if d2 < d1: 
    d2 += timedelta(days=1) 

print d2 - d1 

# 0:35:00 

隨着熊貓,你可以做這樣的事情:

import pandas as pd 
from pandas import Timedelta 

d = { 
    "CallDate": [ 
     "1/8/2009", 
     "1/11/2009", 
     "1/9/2009", 
     "1/11/2009", 
     "1/16/2009", 
     "1/17/2009", 
     "1/19/2009", 
     "1/20/2009", 
     "1/20/2009", 
     "1/23/2009", 
     "1/30/2009" 
    ], 
    "BeginningTime": [ 
     "1900-01-01 07:49:00", 
     "1900-01-01 14:37:00", 
     "1900-01-01 09:29:00", 
     "1900-01-01 09:20:00", 
     "1900-01-01 15:11:00", 
     "1900-01-01 22:52:00", 
     "1900-01-01 05:48:00", 
     "1900-01-01 23:46:00", 
     "1900-01-01 21:29:00", 
     "1900-01-01 07:33:00", 
     "1900-01-01 19:33:00" 
    ], 
    "EndingTime": [ 
     "1900-01-01 08:19:00", 
     "1900-01-01 14:59:00", 
     "1900-01-01 09:56:00", 
     "1900-01-01 10:13:00", 
     "1900-01-01 15:50:00", 
     "1900-01-01 23:26:00", 
     "1900-01-01 06:32:00", 
     "1900-01-01 00:21:00", 
     "1900-01-01 22:08:00", 
     "1900-01-01 07:55:00", 
     "1900-01-01 20:01:00" 
    ] 
} 

df = pd.DataFrame(data=d) 

df['BeginningTime']=pd.to_datetime(df['BeginningTime'], format="%Y-%m-%d %H:%M:%S") 
df['EndingTime']=pd.to_datetime(df['EndingTime'], format="%Y-%m-%d %H:%M:%S") 

def interval(x): 
    if x['EndingTime'] < x['BeginningTime']: 
     x['EndingTime'] += Timedelta(days=1) 
    return x['EndingTime'] - x['BeginningTime'] 

df['Interval'] = df.apply(interval, axis=1) 

In [2]: df 
Out[2]: 
     BeginningTime CallDate   EndingTime Interval 
0 1900-01-01 07:49:00 1/8/2009 1900-01-01 08:19:00 00:30:00 
1 1900-01-01 14:37:00 1/11/2009 1900-01-01 14:59:00 00:22:00 
2 1900-01-01 09:29:00 1/9/2009 1900-01-01 09:56:00 00:27:00 
3 1900-01-01 09:20:00 1/11/2009 1900-01-01 10:13:00 00:53:00 
4 1900-01-01 15:11:00 1/16/2009 1900-01-01 15:50:00 00:39:00 
5 1900-01-01 22:52:00 1/17/2009 1900-01-01 23:26:00 00:34:00 
6 1900-01-01 05:48:00 1/19/2009 1900-01-01 06:32:00 00:44:00 
7 1900-01-01 23:46:00 1/20/2009 1900-01-01 00:21:00 00:35:00 
8 1900-01-01 21:29:00 1/20/2009 1900-01-01 22:08:00 00:39:00 
9 1900-01-01 07:33:00 1/23/2009 1900-01-01 07:55:00 00:22:00 
10 1900-01-01 19:33:00 1/30/2009 1900-01-01 20:01:00 00:28:00