2016-08-15 67 views
2

我想將數據幀列轉換爲timedelta,但我有問題。或 'XX:XX:XX'在數據幀列pd.Timedelta轉換

我的數據框:

df = pd.DataFrame({'time':['+06:00:00', '-04:00:00'],}) 

我的方法:

df['time'] = pd.Timedelta(df['time']) 

該列進來的樣子 ':XX XX + XX' 的格式但是,我得到的錯誤:

ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible 

當我做一個簡單的例子:

time = pd.Timedelta('+06:00:00') 

我得到我想要的輸出:

Timedelta('0 days 06:00:00') 

,會是什麼方法,如果我想的一系列轉換成timedelta與我期望的輸出?

回答

3

錯誤是相當清楚的:

ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible

什麼要傳遞到pd.Timedelta()是沒有上述數據類型:

>>> type(df['time']) 
<class 'pandas.core.series.Series'> 

可能是你想要的:

>>> [pd.Timedelta(x) for x in df['time']] 
[Timedelta('0 days 06:00:00'), Timedelta('-1 days +20:00:00')] 

或者:

>>> df['time'].apply(pd.Timedelta) 
0   06:00:00 
1 -1 days +20:00:00 
Name: time, dtype: timedelta64[ns] 

查看docs中的更多示例。

+0

感謝,對。適用()方法的工作原理,我正在尋找。我感謝您的幫助! – Mike

2

我會強烈建議使用專門設計和矢量(即非常快)方法:to_timedelta()

In [40]: pd.to_timedelta(df['time']) 
Out[40]: 
0   06:00:00 
1 -1 days +20:00:00 
Name: time, dtype: timedelta64[ns] 

定時針對200K行DF:

In [41]: df = pd.concat([df] * 10**5, ignore_index=True) 

In [42]: df.shape 
Out[42]: (200000, 1) 

In [43]: %timeit pd.to_timedelta(df['time']) 
1 loop, best of 3: 891 ms per loop 

In [44]: %timeit df['time'].apply(pd.Timedelta) 
1 loop, best of 3: 7.15 s per loop 

In [45]: %timeit [pd.Timedelta(x) for x in df['time']] 
1 loop, best of 3: 5.52 s per loop