2017-09-22 178 views
1

我有一個數據幀,我想這兩者之間添加包含時間差列的另一列:添加timedelta值的新列在熊貓

df[Diff] = df['End Time'] - df['Open Time'] 
df[Diff] 
0  0 days 01:25:40 
1  0 days 00:41:57 
2  0 days 00:21:47 
3  0 days 16:41:57 
4  0 days 04:32:00 
5  0 days 03:01:57 
6  0 days 01:37:56 
7  0 days 01:13:57 
8  0 days 01:07:56 
9  0 days 02:33:59 
10 29 days 18:33:53 
11 0 days 03:50:56 
12 0 days 01:57:56 

我想有此列格式 '1H25米',所以我試圖計算時間天:

diff = df['End Time'] - df['Open Time'] 
hours = diff.dt.days * 24 + diff.dt.components.hours 
minutes = diff.dt.components.minutes 

並得到這些結果:

0  1 
1  0 
2  0 
3  16 
4  4 
5  3 
6  1 
7  1 
8  1 
9  2 
10 714 
11  3 
12  1 
dtype: int64h 0  25 
1  41 
2  21 
3  41 
4  32 
5  1 
6  37 
7  13 
8  7 
9  33 
10 33 
11 50 
12 57 
Name: minutes, dtype: int64m 

但我不能表達這些結果以這種格式在新列:

'{}h {}m'.format(hours,minutes)) 
+1

嘗試'[「{0}ħ{1} m'.format(* X),用於在拉鍊X(小時,分鐘) ]'? – Zero

+0

@零我想在數據框的幫助下發布。 Im掙扎 – Dark

+1

或者'hours.astype(str)+'h'+ minutes.astype(str)+'m''? – Zero

回答

1

你可以提取相關欄目,並轉換爲使用astypestr,只是CONCAT的COLS需要。

c = (df['End Time'] - df['Open Time'])\ 
       .dt.components[['days', 'hours', 'minutes']] 
df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm' 
df['diff'] 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 714h 33m 
11  3h 50m 
12  1h 57m 
Name: diff, dtype: object 
+0

@Bharathshetty不用擔心Bharath :-) –

+0

@COLDSPEED感謝這種方法,我會盡力實現這一點。也許我不清楚,但我的目的是不浪費幾天的時間。所以在這種情況下,我希望在幾個小時內完成所有的區別。例如,對於索引10,結果應該是'714h 33m'no'18h 33m'。 – bar1

+0

@ bar1我已經通過將天列與24相乘來修復它。 –

1

可以使用total_seconds的轉換timedelta到秒,再算上hoursminutes也秒鐘,什麼是快了10倍,使用dt.components

s = diff.dt.total_seconds().astype(int) 

hours = s // 3600 
# remaining seconds 
s = s - (hours * 3600) 
# minutes 
minutes = s // 60 
# remaining seconds 
seconds = s - (minutes * 60) 

a = hours.astype(str) + 'h ' + minutes.astype(str) 
print (a) 
0  1h 25 
1  0h 41 
2  0h 21 
3  16h 41 
4  4h 32 
5  3h 1 
6  1h 37 
7  1h 13 
8  1h 7 
9  2h 33 
10 714h 33 
11  3h 50 
12  1h 57 
Name: Diff, dtype: object 

Zero comment解決方案:

hours = diff.dt.days * 24 + diff.dt.components.hours 
minutes = diff.dt.components.minutes 

a = hours.astype(str) + 'h ' + minutes.astype(str) 
print (a) 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 18h 33m 
11  3h 50m 
12  1h 57m 
dtype: object 

另:

a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)]) 
print (a) 
0  1h 25m 
1  0h 41m 
2  0h 21m 
3  16h 41m 
4  4h 32m 
5  3h 1m 
6  1h 37m 
7  1h 13m 
8  1h 7m 
9  2h 33m 
10 714h 33m 
11  3h 50m 
12  1h 57m 
dtype: object 

時序

#13000 rows 
df = pd.concat([df]*1000).reset_index(drop=True) 

In [191]: %%timeit 
    ...: hours = diff.dt.days * 24 + diff.dt.components.hours 
    ...: minutes = diff.dt.components.minutes 
    ...: 
    ...: a = hours.astype(str) + 'h ' + minutes.astype(str) 
    ...: 
1 loop, best of 3: 483 ms per loop 

In [192]: %%timeit 
    ...: s = diff.dt.total_seconds().astype(int) 
    ...: 
    ...: hours = s // 3600 
    ...: # remaining seconds 
    ...: s = s - (hours * 3600) 
    ...: # minutes 
    ...: minutes = s // 60 
    ...: # remaining seconds 
    ...: seconds = s - (minutes * 60) 
    ...: 
    ...: a = hours.astype(str) + 'h ' + minutes.astype(str) 
    ...: 
10 loops, best of 3: 43.9 ms per loop 

In [193]: %%timeit 
    ...: hours = diff.dt.days * 24 + diff.dt.components.hours 
    ...: minutes = diff.dt.components.minutes 
    ...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)]) 
    ...: 
1 loop, best of 3: 465 ms per loop 

#cᴏʟᴅsᴘᴇᴇᴅ solution 
In [194]: %%timeit 
    ...: c = diff.dt.components[['days', 'hours', 'minutes']] 
    ...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm' 
    ...: 
1 loop, best of 3: 208 ms per loop 
+0

感謝您的努力。它也起作用 – bar1

+0

是的,我添加了避免'dt.components'的時間,因爲速度慢。 – jezrael