如何將列轉換爲pandas中的一個datetime列？

我有一個數據幀，其中前3列是'MONTH'，'DAY'，'YEAR'如何將列轉換爲pandas中的一個datetime列？

在每列都有一個整數。有沒有在數據框中有所有三列轉換爲日期時間的Pythonic方法？

來源：

M D Y Apples Oranges 
5 6 1990  12  3 
5 7 1990  14  4 
5 8 1990  15  34 
5 9 1990  23  21

到：

Datetimes Apples Oranges 
1990-6-5  12  3 
1990-7-5  14  4 
1990-8-5  15  34 
1990-9-5  23  21

來源

2013-10-13 user1367204

我這樣做是使用一個循環，但它花了很長時間。 – user1367204

在0.13（很快到來），這是高度優化和相當快的（但還是蠻快0.12）;兩個數量級的速度比循環更快

In [3]: df 
Out[3]: 
    M D  Y Apples Oranges 
0 5 6 1990  12  3 
1 5 7 1990  14  4 
2 5 8 1990  15  34 
3 5 9 1990  23  21 

In [4]: df.dtypes 
Out[4]: 
M   int64 
D   int64 
Y   int64 
Apples  int64 
Oranges int64 
dtype: object 

# in 0.12, use this 
In [5]: pd.to_datetime((df.Y*10000+df.M*100+df.D).apply(str),format='%Y%m%d') 

# in 0.13 the above or this will work 
In [5]: pd.to_datetime(df.Y*10000+df.M*100+df.D,format='%Y%m%d') 
Out[5]: 
0 1990-05-06 00:00:00 
1 1990-05-07 00:00:00 
2 1990-05-08 00:00:00 
3 1990-05-09 00:00:00 
dtype: datetime64[ns]

來源

2013-10-13 22:23:53 Jeff

謝謝，但我得到的錯誤： TypeError：預期的字符串或緩衝區 – user1367204

我編輯，告訴你如何在0.12做到這一點。 ''to_datetime''需要將數據串化。 – Jeff

謝謝，它的工作，你能解釋* 10000和* 100的目的是什麼？沒關係，目的是將2011年，5年，3年，轉換成20110503這可以很容易地閱讀。謝謝！ – user1367204

我重新接近這個問題，我想我找到了一個解決方案。我初始化通過以下方式csv文件：

pandas_object = DataFrame(read_csv('/Path/to/csv/file', parse_dates=True, index_col = [2,0,1]))

凡：

index_col = [2,0,1]

代表[年，月，日]的列

只有現在的問題是，現在我有三個新的索引列，一個代表年份，另一個代表月份，另一個代表當天。

來源

2013-10-13 23:30:46 user1367204

嘗試'parse_dates = [[2,0,1]]'（注意雙括號。例如，看看'read_csv'上的doctstring。 – TomAugspurger

以下是使用NumPy datetime64 and timedelta64 arithmetic的替代方案。這似乎是一個更快一點的小DataFrames和更快較大DataFrames：

import numpy as np 
import pandas as pd 

df = pd.DataFrame({'M':[1,2,3,4], 'D':[6,7,8,9], 'Y':[1990,1991,1992,1993]}) 
# D M  Y 
# 0 6 1 1990 
# 1 7 2 1991 
# 2 8 3 1992 
# 3 9 4 1993 

y = np.array(df['Y']-1970, dtype='<M8[Y]') 
m = np.array(df['M']-1, dtype='<m8[M]') 
d = np.array(df['D']-1, dtype='<m8[D]') 
dates2 = pd.Series(y+m+d) 
# 0 1990-01-06 
# 1 1991-02-07 
# 2 1992-03-08 
# 3 1993-04-09 
# dtype: datetime64[ns]

In [214]: df = pd.concat([df]*1000) 

In [215]: %timeit pd.to_datetime((df['Y']*10000+df['M']*100+df['D']).astype('int'), format='%Y%m%d') 
100 loops, best of 3: 4.87 ms per loop 

In [216]: %timeit pd.Series(np.array(df['Y']-1970, dtype='<M8[Y]')+np.array(df['M']-1, dtype='<m8[M]')+np.array(df['D']-1, dtype='<m8[D]')) 
1000 loops, best of 3: 839 µs per loop

這裏只是一個輔助功能，使這個更容易使用：

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None, 
       seconds=None, milliseconds=None, microseconds=None, nanoseconds=None): 
    years = np.asarray(years) - 1970 
    months = np.asarray(months) - 1 
    days = np.asarray(days) - 1 
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]', 
      '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]') 
    vals = (years, months, days, weeks, hours, minutes, seconds, 
      milliseconds, microseconds, nanoseconds) 
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals) 
       if v is not None) 

In [437]: combine64(df['Y'], df['M'], df['D']) 
Out[437]: array(['1990-01-06', '1991-02-07', '1992-03-08', '1993-04-09'], dtype='datetime64[D]')

來源

2014-09-01 19:39:47 unutbu

我認爲這個或者這個功能至少是一個很好的我們應該找出一個API – joris

是的，不得不做''10000'或'-1970'這樣的事情是愚蠢的，我們應該能夠以更簡單的方式組合標準時間類型（如果有*是一種更好的方法，但我們都不知道它，那麼至少有一個文檔錯誤..） – DSM

將數據幀轉換爲字符串以簡化字符串連接：

df=df.astype(str)

然後將其轉換爲datetime，指定格式：

df.index=pd.to_datetime(df.Y+df.M+df.D,format="%Y%m%d")

它取代了指數，而不是創建一個新列。

來源

2015-06-10 04:30:58

讓我們假設你有一本字典foo，並行的每一列日期。如果是這樣，這是你的一個班輪：

>>> from datetime import datetime 
>>> foo = {"M": [1,2,3], "D":[30,30,21], "Y":[1980,1981,1982]} 
>>> 
>>> df = pd.DataFrame({"Datetime": [datetime(y,m,d) for y,m,d in zip(foo["Y"],foo["M"],foo["D"])]})

它的真正勇氣此位：

>>> [datetime(y,m,d) for y,m,d in zip(foo["Y"],foo["M"],foo["D"])] 
[datetime.datetime(1980, 1, 30, 0, 0), datetime.datetime(1981, 2, 28, 0, 0), datetime.datetime(1982, 3, 21, 0, 0)]

這是諸如此類的事情，zip用於製成。它需要並行列表並將它們轉換爲元組。然後他們通過列表理解將元組解壓縮（for y,m,d in位），然後將其饋送到對象構造函數datetime中。

pandas看起來很滿意日期時間對象。

來源

2015-06-10 04:44:12 Dan

在0.18.1版本，你可以使用to_datetime，但：列

名必須year，month，day，hour，minute和second：
最少列year，month和day

樣品：

import pandas as pd 

df = pd.DataFrame({'year': [2015, 2016], 
        'month': [2, 3], 
        'day': [4, 5], 
        'hour': [2, 3], 
        'minute': [10, 30], 
        'second': [21,25]}) 

print df 
    day hour minute month second year 
0 4  2  10  2  21 2015 
1 5  3  30  3  25 2016 

print pd.to_datetime(df[['year', 'month', 'day']]) 
0 2015-02-04 
1 2016-03-05 
dtype: datetime64[ns] 

print pd.to_datetime(df[['year', 'month', 'day', 'hour']]) 
0 2015-02-04 02:00:00 
1 2016-03-05 03:00:00 
dtype: datetime64[ns] 

print pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']]) 
0 2015-02-04 02:10:00 
1 2016-03-05 03:30:00 
dtype: datetime64[ns] 

print pd.to_datetime(df) 
0 2015-02-04 02:10:21 
1 2016-03-05 03:30:25 
dtype: datetime64[ns]

另一種解決方案是轉換爲dictionary：

print df 
    M D  Y Apples Oranges 
0 5 6 1990  12  3 
1 5 7 1990  14  4 
2 5 8 1990  15  34 
3 5 9 1990  23  21 

print pd.to_datetime(dict(year=df.Y, month=df.M, day=df.D)) 
0 1990-05-06 
1 1990-05-07 
2 1990-05-08 
3 1990-05-09 
dtype: datetime64[ns]

來源

2016-05-08 18:06:27 jezrael

[pd.to_datetime(str(a)+str(b)+str(c), format='%m%d%Y') for a,b,c in zip(df.M, df.D, df.Y)]

來源

2016-11-01 13:51:41

如何將列轉換爲pandas中的一個datetime列？

回答

相關問題