2013-07-27 59 views
3
completed    deadline 
15-07-2013 23:10 15-07-2013 23:15 
16-07-2013 00:20 16-07-2013 00:15 
16-07-2013 00:20 16-07-2013 00:15 
16-07-2013 21:04 16-07-2013 21:30 
16-07-2013 21:58 16-07-2013 22:00 
16-07-2013 23:21 16-07-2013 23:15 
16-07-2013 23:21 16-07-2013 23:15 
17-07-2013 00:19 17-07-2013 00:15 
17-07-2013 00:19 17-07-2013 00:15 
17-07-2013 21:18 17-07-2013 21:30 
17-07-2013 22:07 17-07-2013 22:00 

當我說data['completed'] - data['deadline']我得到;使用python熊貓計算時差和打印到csv

-1 day, 23:55:00 # on time 
     0:05:00 
     0:05:00 
-1 day, 23:34:00 # on time 
-1 day, 23:58:00 # on time 
     0:06:00 
     0:06:00 
     0:04:00 
     0:04:00 
-1 day, 23:48:00 # on time 
     0:07:00 

但是當我做data['time_delay'] = data['completed'] - data['deadline']和打印data['time_delay']我得到的;

-300000000000 
300000000000 
300000000000 
-1560000000000 
-120000000000 
360000000000 
360000000000 
240000000000 
240000000000 
-720000000000 
420000000000 

當輸出打印到csv時,我會得到相同的結果。

我如何:

  1. 處理這個輸出?

  2. 以'分鐘'格式打印輸出到csv?

  3. 句柄「準時」輸出?

回答

2
>>> data = pd.read_csv('1.csv', parse_dates=[0,1]) 
>>> data['time_delay'] = data['completed'] - data['deadline'] 
>>> print data['time_delay'] 
0 -00:05:00 
1 00:05:00 
2 00:05:00 
3 -00:26:00 
4 -00:02:00 
Name: time_delay, dtype: timedelta64[ns] 
>>> data.to_csv(sys.stdout) 
,completed,deadline,time_delay 
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-300000000000 
1,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000 
2,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000 
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-1560000000000 
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-120000000000 
>>> data['time_delay'] = data['time_delay'].apply(pd.lib.repr_timedelta64) 
>>> data.to_csv(sys.stdout) 
,completed,deadline,time_delay 
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-00:05:00 
1,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00 
2,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00 
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-00:26:00 
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-00:02:00 

pandas.lib.repr_timedelta64不無證。所以這個代碼可能會在未來打破。 (我用熊貓0.11.0)

+0

謝謝!正在爲此而掙扎! – richie

+0

僅供參考,反向操作尚未實現(讀取timedelta列);此列將被讀爲對象dtype – Jeff

1

試試這個:

def func(x,y): 
    if x > y: 
    return 'delayed by ' + str(((x-y).seconds//60)%60) + ' minutes' 
    else: 
    return 'on time by ' + str(((y-x).seconds//60)%60) + ' minutes' 


    data["ontime"] = data.apply(lambda row: func(row["completed"], row["deadline"]), axis=1) 

這給:

completed     deadline    ontime 
0 2013-07-15 23:10:00 2013-07-15 23:15:00  on time by 5 minutes 
1 2013-07-16 00:20:00 2013-07-16 00:15:00  delayed by 5 minutes 
2 2013-07-16 00:20:00 2013-07-16 00:15:00  delayed by 5 minutes 
3 2013-07-16 21:04:00 2013-07-16 21:30:00  on time by 26 minutes 
4 2013-07-16 21:58:00 2013-07-16 22:00:00  on time by 2 minutes 
5 2013-07-16 23:21:00 2013-07-16 23:15:00  delayed by 6 minutes 
6 2013-07-16 23:21:00 2013-07-16 23:15:00  delayed by 6 minutes 
7 2013-07-17 00:19:00 2013-07-17 00:15:00  delayed by 4 minutes 
8 2013-07-17 00:19:00 2013-07-17 00:15:00  delayed by 4 minutes 
9 2013-07-17 21:18:00 2013-07-17 21:30:00  on time by 12 minutes 
10 2013-07-17 22:07:00 2013-07-17 22:00:00  delayed by 7 minutes 
+0

沒有看到接受的答案,這給出了更好的解決方案。 –

+0

不錯的答案,但是當我嘗試你的代碼時遇到以下錯誤; ''不支持的操作數類型(s)爲 - :'str'和'str'「,u'發生在索引0' – richie

+1

我試過這個,它工作。 'data [「ontime」] = data.apply(lambda row:func(pd.Timestamp(row [「completed」]),pd.Timestamp(row [「deadline」])),axis = 1)' – richie