2017-08-06 62 views
1

我遇到了Python + Numpy + Pandas的問題。添加到Pandas DataFrame時發生datetime64錯誤

我有一個時間戳列表,精確到毫秒,編碼爲字符串。然後我將它們四捨五入到10ms的分辨率,這很順利。當我將新的四捨五入時間戳添加到DataFrame中作爲一個新列時,會出現這個錯誤 - datetime64對象的值會被完全破壞。

我做錯了什麼?或者是Pandas/NumPy錯誤?

順便說一句,我懷疑,這個錯誤只出現在Windows上 - 我沒有注意到,當我昨天在Mac上嘗試相同的代碼(沒有驗證這一點)。

import numpy 
import pandas as pd 

# We create a list of strings. 
time_str_arr = ['2017-06-30T13:51:15.854', '2017-06-30T13:51:16.250', 
       '2017-06-30T13:51:16.452', '2017-06-30T13:51:16.659'] 
# Then we create a time array, rounded to 10ms (actually floored, 
# not rounded), everything seems to be fine here. 
rounded_time = numpy.array(time_str_arr, dtype="datetime64[10ms]") 
rounded_time 

# Then we create a Pandas DataFrame and assign the time array as a 
# column to it. The datetime64 is destroyed. 
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 
    'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
df = df.assign(wrong_time=rounded_time) 
df 

輸出我得到:

one two wrong_time 
a 1.0 1.0 1974-10-01 18:11:07.585 
b 2.0 2.0 1974-10-01 18:11:07.625 
c 3.0 3.0 1974-10-01 18:11:07.645 
d NaN 4.0 1974-10-01 18:11:07.665 

輸出pd.show_versions()的:

INSTALLED VERSIONS 
commit: None 
python: 3.6.1.final.0 
python-bits: 64 
OS: Windows 
OS-release: 10 
machine: AMD64 
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel 
byteorder: little 
LC_ALL: None 
LANG: None 
LOCALE: None.None 

pandas: 0.20.1 
pytest: 3.0.7 
pip: 9.0.1 
setuptools: 27.2.0 
Cython: 0.25.2 
numpy: 1.12.1 
scipy: 0.19.0 
xarray: None 
IPython: 5.3.0 
sphinx: 1.5.6 
patsy: 0.4.1 
dateutil: 2.6.0 
pytz: 2017.2 
blosc: None 
bottleneck: 1.2.1 
tables: 3.2.2 
numexpr: 2.6.2 
feather: None 
matplotlib: 2.0.2 
openpyxl: 2.4.7 
xlrd: 1.0.0 
xlwt: 1.2.0 
xlsxwriter: 0.9.6 
lxml: 3.7.3 
bs4: 4.6.0 
html5lib: 0.999 
sqlalchemy: 1.1.9 
pymysql: None 
psycopg2: None 
jinja2: 2.9.6 
s3fs: None 
pandas_gbq: None 
pandas_datareader: None 
+0

你可以用'pd.to_datetime(time_str_arr )' –

+0

我試過了pd.to_datetime(time_str_arr)。它沒有改變任何東西。該錯誤不會將字符串轉換爲日期時間。這一步工作正常。錯誤是,當我嘗試將datetime64數組添加到dateframe時,datetime64數組被破壞(或未正確導入)。 –

回答

1

在我看來,這是錯誤的,因爲很明顯numpy.datetime64被強制轉換爲Timestamp內部小號。

對我的作品使用to_datetime

df = df.assign(wrong_time=pd.to_datetime(rounded_time)) 
print (df) 
    one two    wrong_time 
a 1.0 1.0 2017-06-30 13:51:15.850 
b 2.0 2.0 2017-06-30 13:51:16.250 
c 3.0 3.0 2017-06-30 13:51:16.450 
d NaN 4.0 2017-06-30 13:51:16.650 

另一種解決方案是強制轉換爲ns

df = df.assign(wrong_time=rounded_time.astype('datetime64[ns]')) 
print (df) 
    one two    wrong_time 
a 1.0 1.0 2017-06-30 13:51:15.850 
b 2.0 2.0 2017-06-30 13:51:16.250 
c 3.0 3.0 2017-06-30 13:51:16.450 
d NaN 4.0 2017-06-30 13:51:16.650 
+1

aa,ok - 我在錯誤的地方添加了pd.to_datetime()。這真的很有用,謝謝! –

0

我在熊貓的Git倉庫打開的問題。並得到了傑夫·瑞貝克的建議解決方案:不是創造怪異爲10ms datetime64對象,我們只是一輪時間戳使用地板()函數:

In [16]: # We create a list of strings. 
...: time_str_arr = ['2017-06-30T13:51:15.854', '2017-06-30T13:51:16.250', 
...:     '2017-06-30T13:51:16.452', '2017-06-30T13:51:16.659'] 

In [17]: pd.to_datetime(time_str_arr).floor('10ms') 
Out[17]: DatetimeIndex(['2017-06-30 13:51:15.850000', '2017-06-30 13:51:16.250000', '2017-06-30 13:51:16.450000', '2017-06-30 13:51:16.650000'], dtype='datetime64[ns]', freq=None) 

解決方案從https://github.com/pandas-dev/pandas/issues/17183

相關問題