2017-05-15 56 views
1

我想將日期時間轉換爲Unix時間紀元,但我得到以下錯誤。INT - 字符串類型錯誤,同時將datetime轉換爲Unix時間紀元

輸入:

userid,datetime,latitude,longitude 
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313 

計劃:

import pandas as pd 
import numpy as np 
import io 

df = pd.read_csv('input.csv', 
       #header=None, #no header in csv 
       header=['userid','datetime','latitude','longitude'], #set custom column names 
       parse_dates=['datetime']) #parse columns d, e to datetime 

df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
#df['e'] = df['e'].astype(np.int64) // 10**9 

df.to_csv('output.csv', header=True, index=False) 

上述程序工作正常時,在Python 2.7版,但並不是說我已經升級到Python 3.x都有蟒蛇,我沒能得到結果

錯誤:

File "pandas\parser.pyx", line 519, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5907) 

TypeError: Can't convert 'int' object to str implicitly 

編輯:輸入文件here

回答

1

header論點pd.read_csv預計的int int或列表不是字符串列表。

from io import StringIO 
file=""" 
userid,datetime,latitude,longitude 
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313""" 

讓我們試試這個read_csv聲明:

df = pd.read_csv(StringIO(file),parse_dates=['datetime']) 
df['datetime'] = df['datetime'].astype(np.int64) // 10**9 

print(df.head()) 

輸出:

userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187 1391209201 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746 
+0

謝謝你的答案但我得到以下錯誤:'ValueError:'datetime'不在列表中'csv文件的相同輸入根本沒有任何更改 –

+0

您可以將CSV文件的前三行粘貼到此處嗎? –

+0

我已經給出了輸入文件作爲鏈接,你可以從那裏下載 –

2

如果CSV沒有頭是必要的參數namesparse_dates[1] - 嘗試解析第二列datetime

import pandas as pd 
import numpy as np 
from pandas.compat import StringIO 

temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
df = pd.read_csv(StringIO(temp), 
       parse_dates=[1], 
       names=['userid','datetime','latitude','longitude']) 
#print (df) 

#check dtypes if datetime it is OK 
print (df['datetime'].dtypes) 
datetime64[ns] 
df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
print (df) 
    userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187 1391209201 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746 
5  191 1391209202 41.852305 12.577407 
6  343 1391209202 41.892172 12.469700 
7  341 1391209202 41.910213 12.477000 
8  260 1391209203 41.865821 12.465522 

另一個可能的問題是錯誤的數據,在我的示例第二行:

import pandas as pd 
from pandas.compat import StringIO 

temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346 
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667 
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339 
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157 
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618 
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898 
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151 
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041 
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
df = pd.read_csv(StringIO(temp), 
       parse_dates=[1], 
       names=['userid','datetime','latitude','longitude']) 

#print (df) 

#check dtypes - parse failed, get object dtype 
print (df['datetime'].dtypes) 
object 

解析與to_datetime和參數errors='coerce' DATETIME - 它更換損壞的數據NaT然後更換NAT來一些價值例如01970-01-01 00:00:00.000000)與fillna

df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce').fillna(0) 
print (df) 
    userid     datetime latitude longitude 
0  156 2014-01-31 23:00:00.739166 41.883672 12.487778 
1  187 1970-01-01 00:00:00.000000 41.928543 12.469037 
2  297 2014-01-31 23:00:01.220066 41.891069 12.492705 
3  89 2014-01-31 23:00:01.470854 41.793177 12.432122 
4  79 2014-01-31 23:00:01.631136 41.900275 12.462746 
5  191 2014-01-31 23:00:02.048546 41.852305 12.577407 
6  343 2014-01-31 23:00:02.647839 41.892172 12.469700 
7  341 2014-01-31 23:00:02.709888 41.910213 12.477000 
8  260 2014-01-31 23:00:03.458195 41.865821 12.465522 


df['datetime'] = df['datetime'].astype(np.int64) // 10**9 
print (df) 
    userid datetime latitude longitude 
0  156 1391209200 41.883672 12.487778 
1  187   0 41.928543 12.469037 
2  297 1391209201 41.891069 12.492705 
3  89 1391209201 41.793177 12.432122 
4  79 1391209201 41.900275 12.462746 
5  191 1391209202 41.852305 12.577407 
6  343 1391209202 41.892172 12.469700 
7  341 1391209202 41.910213 12.477000 
8  260 1391209203 41.865821 12.465522 

編輯:

如果也有標題和需要更換的列名需要header=0添加到read_csv

+0

非常感謝。這真是太棒了!但我只能接受一個答案。你認爲你可以幫助我解決這個問題:從2.x遷移到3.x會感受到如此多的變化.. http://stackoverflow.com/questions/43970972/typeerror-unsupported-operand-types-for- str-and-str-in-python-3-x-anac/43971336#43971336 –

+0

是的,這是你的決定,哪個答案會被接受。 – jezrael

+0

在你的第二個問題 - 哪一行代碼返回錯誤? – jezrael

相關問題