添加到Pandas DataFrame時發生datetime64錯誤

我遇到了Python + Numpy + Pandas的問題。添加到Pandas DataFrame時發生datetime64錯誤

我有一個時間戳列表，精確到毫秒，編碼爲字符串。然後我將它們四捨五入到10ms的分辨率，這很順利。當我將新的四捨五入時間戳添加到DataFrame中作爲一個新列時，會出現這個錯誤 - datetime64對象的值會被完全破壞。

我做錯了什麼？或者是Pandas/NumPy錯誤？

順便說一句，我懷疑，這個錯誤只出現在Windows上 - 我沒有注意到，當我昨天在Mac上嘗試相同的代碼（沒有驗證這一點）。

import numpy 
import pandas as pd 

# We create a list of strings. 
time_str_arr = ['2017-06-30T13:51:15.854', '2017-06-30T13:51:16.250', 
       '2017-06-30T13:51:16.452', '2017-06-30T13:51:16.659'] 
# Then we create a time array, rounded to 10ms (actually floored, 
# not rounded), everything seems to be fine here. 
rounded_time = numpy.array(time_str_arr, dtype="datetime64[10ms]") 
rounded_time 

# Then we create a Pandas DataFrame and assign the time array as a 
# column to it. The datetime64 is destroyed. 
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 
    'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
df = df.assign(wrong_time=rounded_time) 
df

輸出我得到：

one two wrong_time 
a 1.0 1.0 1974-10-01 18:11:07.585 
b 2.0 2.0 1974-10-01 18:11:07.625 
c 3.0 3.0 1974-10-01 18:11:07.645 
d NaN 4.0 1974-10-01 18:11:07.665

輸出pd.show_versions（）的：

INSTALLED VERSIONS 
commit: None 
python: 3.6.1.final.0 
python-bits: 64 
OS: Windows 
OS-release: 10 
machine: AMD64 
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel 
byteorder: little 
LC_ALL: None 
LANG: None 
LOCALE: None.None 

pandas: 0.20.1 
pytest: 3.0.7 
pip: 9.0.1 
setuptools: 27.2.0 
Cython: 0.25.2 
numpy: 1.12.1 
scipy: 0.19.0 
xarray: None 
IPython: 5.3.0 
sphinx: 1.5.6 
patsy: 0.4.1 
dateutil: 2.6.0 
pytz: 2017.2 
blosc: None 
bottleneck: 1.2.1 
tables: 3.2.2 
numexpr: 2.6.2 
feather: None 
matplotlib: 2.0.2 
openpyxl: 2.4.7 
xlrd: 1.0.0 
xlwt: 1.2.0 
xlsxwriter: 0.9.6 
lxml: 3.7.3 
bs4: 4.6.0 
html5lib: 0.999 
sqlalchemy: 1.1.9 
pymysql: None 
psycopg2: None 
jinja2: 2.9.6 
s3fs: None 
pandas_gbq: None 
pandas_datareader: None

來源

2017-08-06 Girts Strazdins

你可以用'pd.to_datetime（time_str_arr ）' –

我試過了pd.to_datetime（time_str_arr）。它沒有改變任何東西。該錯誤不會將字符串轉換爲日期時間。這一步工作正常。錯誤是，當我嘗試將datetime64數組添加到dateframe時，datetime64數組被破壞（或未正確導入）。 –

在我看來，這是錯誤的，因爲很明顯numpy.datetime64被強制轉換爲Timestamp內部小號。

對我的作品使用to_datetime：

df = df.assign(wrong_time=pd.to_datetime(rounded_time)) 
print (df) 
    one two    wrong_time 
a 1.0 1.0 2017-06-30 13:51:15.850 
b 2.0 2.0 2017-06-30 13:51:16.250 
c 3.0 3.0 2017-06-30 13:51:16.450 
d NaN 4.0 2017-06-30 13:51:16.650

另一種解決方案是強制轉換爲ns：

df = df.assign(wrong_time=rounded_time.astype('datetime64[ns]')) 
print (df) 
    one two    wrong_time 
a 1.0 1.0 2017-06-30 13:51:15.850 
b 2.0 2.0 2017-06-30 13:51:16.250 
c 3.0 3.0 2017-06-30 13:51:16.450 
d NaN 4.0 2017-06-30 13:51:16.650

來源

2017-08-06 18:34:40 jezrael

aa，ok - 我在錯誤的地方添加了pd.to_datetime（）。這真的很有用，謝謝！ –

我在熊貓的Git倉庫打開的問題。並得到了傑夫·瑞貝克的建議解決方案：不是創造怪異爲10ms datetime64對象，我們只是一輪時間戳使用地板（）函數：

In [16]: # We create a list of strings. 
...: time_str_arr = ['2017-06-30T13:51:15.854', '2017-06-30T13:51:16.250', 
...:     '2017-06-30T13:51:16.452', '2017-06-30T13:51:16.659'] 

In [17]: pd.to_datetime(time_str_arr).floor('10ms') 
Out[17]: DatetimeIndex(['2017-06-30 13:51:15.850000', '2017-06-30 13:51:16.250000', '2017-06-30 13:51:16.450000', '2017-06-30 13:51:16.650000'], dtype='datetime64[ns]', freq=None)

解決方案從https://github.com/pandas-dev/pandas/issues/17183

來源

2017-08-07 06:56:03

添加到Pandas DataFrame時發生datetime64錯誤

回答

相關問題