INT - 字符串類型錯誤，同時將datetime轉換爲Unix時間紀元

我想將日期時間轉換爲Unix時間紀元，但我得到以下錯誤。INT - 字符串類型錯誤，同時將datetime轉換爲Unix時間紀元

輸入：

userid,datetime,latitude,longitude 
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313

計劃：

import pandas as pd 
import numpy as np 
import io 

df = pd.read_csv('input.csv', 
       #header=None, #no header in csv 
       header=['userid','datetime','latitude','longitude'], #set custom column names 
       parse_dates=['datetime']) #parse columns d, e to datetime 

df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
#df['e'] = df['e'].astype(np.int64) // 10**9 

df.to_csv('output.csv', header=True, index=False)

上述程序工作正常時，在Python 2.7版，但並不是說我已經升級到Python 3.x都有蟒蛇，我沒能得到結果

錯誤：

File "pandas\parser.pyx", line 519, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5907) 

TypeError: Can't convert 'int' object to str implicitly

編輯：輸入文件here

來源

2017-05-15 Sitz Blogz

的header論點pd.read_csv預計的int int或列表不是字符串列表。

from io import StringIO 
file=""" 
userid,datetime,latitude,longitude 
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""

讓我們試試這個read_csv聲明：

df = pd.read_csv(StringIO(file),parse_dates=['datetime']) 
df['datetime'] = df['datetime'].astype(np.int64) // 10**9 

print(df.head())

輸出：

userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187 1391209201 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746

來源

2017-05-15 05:03:58

謝謝你的答案但我得到以下錯誤：'ValueError：'datetime'不在列表中'csv文件的相同輸入根本沒有任何更改 –

您可以將CSV文件的前三行粘貼到此處嗎？ –

我已經給出了輸入文件作爲鏈接，你可以從那裏下載 –

如果CSV沒有頭是必要的參數names和parse_dates與[1] - 嘗試解析第二列datetime ：

import pandas as pd 
import numpy as np 
from pandas.compat import StringIO 

temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
df = pd.read_csv(StringIO(temp), 
       parse_dates=[1], 
       names=['userid','datetime','latitude','longitude']) 
#print (df) 

#check dtypes if datetime it is OK 
print (df['datetime'].dtypes) 
datetime64[ns]

df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
print (df) 
    userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187 1391209201 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746 
5  191 1391209202 41.852305 12.577407 
6  343 1391209202 41.892172 12.469700 
7  341 1391209202 41.910213 12.477000 
8  260 1391209203 41.865821 12.465522

另一個可能的問題是錯誤的數據，在我的示例第二行：

import pandas as pd 
from pandas.compat import StringIO 

temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
df = pd.read_csv(StringIO(temp), 
       parse_dates=[1], 
       names=['userid','datetime','latitude','longitude']) 

#print (df) 

#check dtypes - parse failed, get object dtype 
print (df['datetime'].dtypes) 
object

解析與to_datetime和參數errors='coerce' DATETIME - 它更換損壞的數據NaT然後更換NAT來一些價值例如0（1970-01-01 00:00:00.000000）與fillna：

df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce').fillna(0) 
print (df) 
    userid     datetime latitude longitude 
0  156 2014-01-31 23:00:00.739166 41.883672 12.487778 
1  187 1970-01-01 00:00:00.000000 41.928543 12.469037 
2  297 2014-01-31 23:00:01.220066 41.891069 12.492705 
3  89 2014-01-31 23:00:01.470854 41.793177 12.432122 
4  79 2014-01-31 23:00:01.631136 41.900275 12.462746 
5  191 2014-01-31 23:00:02.048546 41.852305 12.577407 
6  343 2014-01-31 23:00:02.647839 41.892172 12.469700 
7  341 2014-01-31 23:00:02.709888 41.910213 12.477000 
8  260 2014-01-31 23:00:03.458195 41.865821 12.465522 


df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
print (df) 
    userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187   0 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746 
5  191 1391209202 41.852305 12.577407 
6  343 1391209202 41.892172 12.469700 
7  341 1391209202 41.910213 12.477000 
8  260 1391209203 41.865821 12.465522

編輯：

如果也有標題和需要更換的列名需要header=0添加到read_csv。

來源

2017-05-15 05:35:32 jezrael

非常感謝。這真是太棒了！但我只能接受一個答案。你認爲你可以幫助我解決這個問題：從2.x遷移到3.x會感受到如此多的變化.. http://stackoverflow.com/questions/43970972/typeerror-unsupported-operand-types-for- str-and-str-in-python-3-x-anac/43971336＃43971336 –

是的，這是你的決定，哪個答案會被接受。 – jezrael

在你的第二個問題 - 哪一行代碼返回錯誤？ – jezrael

INT - 字符串類型錯誤，同時將datetime轉換爲Unix時間紀元

回答

相關問題