以下是使用NumPy datetime64 and timedelta64 arithmetic的替代方案。這似乎是一個更快一點的小DataFrames和更快較大DataFrames:
import numpy as np
import pandas as pd
df = pd.DataFrame({'M':[1,2,3,4], 'D':[6,7,8,9], 'Y':[1990,1991,1992,1993]})
# D M Y
# 0 6 1 1990
# 1 7 2 1991
# 2 8 3 1992
# 3 9 4 1993
y = np.array(df['Y']-1970, dtype='<M8[Y]')
m = np.array(df['M']-1, dtype='<m8[M]')
d = np.array(df['D']-1, dtype='<m8[D]')
dates2 = pd.Series(y+m+d)
# 0 1990-01-06
# 1 1991-02-07
# 2 1992-03-08
# 3 1993-04-09
# dtype: datetime64[ns]
In [214]: df = pd.concat([df]*1000)
In [215]: %timeit pd.to_datetime((df['Y']*10000+df['M']*100+df['D']).astype('int'), format='%Y%m%d')
100 loops, best of 3: 4.87 ms per loop
In [216]: %timeit pd.Series(np.array(df['Y']-1970, dtype='<M8[Y]')+np.array(df['M']-1, dtype='<m8[M]')+np.array(df['D']-1, dtype='<m8[D]'))
1000 loops, best of 3: 839 µs per loop
這裏只是一個輔助功能,使這個更容易使用:
def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
years = np.asarray(years) - 1970
months = np.asarray(months) - 1
days = np.asarray(days) - 1
types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
'<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
vals = (years, months, days, weeks, hours, minutes, seconds,
milliseconds, microseconds, nanoseconds)
return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
if v is not None)
In [437]: combine64(df['Y'], df['M'], df['D'])
Out[437]: array(['1990-01-06', '1991-02-07', '1992-03-08', '1993-04-09'], dtype='datetime64[D]')
我這樣做是使用一個循環,但它花了很長時間。 – user1367204