讀取與timestamp列的CSV，與熊貓

當這樣做：讀取與timestamp列的CSV，與熊貓

import pandas 
x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', 
           names=['DateTime', 'X'], header=None, sep=';')

與此data.csv文件：

1449054136.83;15.31 
1449054137.43;16.19 
1449054138.04;19.22 
1449054138.65;15.12 
1449054139.25;13.12

（第1式柱是UNIX時間戳，即秒自1起經過/ 1/1970），每15秒重新採樣數據時出現此錯誤：x.resample('15S')：

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex

這就像「日期時間」的信息還沒有被解析：

    X 
DateTime  
1.449054e+09 15.31     
1.449054e+09 16.19 
...

如何導入與日期。CSV存儲與熊貓模塊時間戳？

然後一次我就可以導入CSV，如何獲取哪些日期的行> 2015-12-02 12:02:18？

來源

2015-12-06 Basj

read_csv的默認分隔符是逗號，嘗試將'sep =';''傳遞給'read_csv ' – EdChum

@EdChum對不起，我已經在我的代碼中加入了'sep =';''，我忘了在這裏提出這個問題。我也編輯了這個問題，增加了一些關於什麼不起作用的解釋 – Basj

我認爲這個問題是http://stackoverflow.com/questions/12251483/idiomatic-way-to-parse-posix-timestamps-in-熊貓。 –

我的解決辦法類似於邁克：

import pandas 
import datetime 
def dateparse (time_in_secs):  
    return datetime.datetime.fromtimestamp(float(time_in_secs)) 

x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None) 

out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))

來源

2015-12-06 20:52:19

非常感謝！你有一個如何訪問'x'行的例子，該日期是> 2015-12-02 12:02:18？（即按日期過濾） – Basj

熊貓的解決方案相當簡單。我編輯瞭解決方案。 –

你知道爲什麼我無法按照建議[這裏]（http://stackoverflow.com/a/22898920/1422096）嗎？我應該可以做'x.ix ['2015-12-02 12:02:18'：'2015-12-31 23:59:59']'或'x.loc [...]'，爲什麼它沒有按照那裏的建議工作？是因爲日期時間列不是索引嗎？那麼如何使它成爲「索引」呢？ – Basj

您可以自己解析日期：

import time 
import pandas as pd 

def date_parser(string_list): 
    return [time.ctime(float(x)) for x in string_list] 

df = pd.read_csv('data.csv', parse_dates=[0], sep=';', 
       date_parser=date_parser, 
       index_col='DateTime', 
       names=['DateTime', 'X'], header=None)

結果：

>>> df 
         X 
DateTime     
2015-12-02 12:02:16 15.31 
2015-12-02 12:02:17 16.19 
2015-12-02 12:02:18 19.22 
2015-12-02 12:02:18 15.12 
2015-12-02 12:02:19 13.12

來源

2015-12-06 20:50:38

非常感謝！然後（對於問題的第二部分），如何訪問'df'的子部分，其日期> 2015-12-02 12:02:18？（即過濾） – Basj

使用to_datetime並通過unit='s'解析單位作爲unix時間戳，這將會快得多：

In [7]: 
pd.to_datetime(df.index, unit='s') 

Out[7]: 
DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000', 
       '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000', 
       '2015-12-02 11:02:19.250000'], 
       dtype='datetime64[ns]', name=0, freq=None)

時序：

In [9]: 

import time 
%%timeit 
import time 
def date_parser(string_list): 
    return [time.ctime(float(x)) for x in string_list] 
 
df = pd.read_csv(io.StringIO(t), parse_dates=[0], sep=';', 
       date_parser=date_parser, 
       index_col='DateTime', 
       names=['DateTime', 'X'], header=None) 
100 loops, best of 3: 4.07 ms per loop

和

In [12]: 
%%timeit 
t="""1449054136.83;15.31 
1449054137.43;16.19 
1449054138.04;19.22 
1449054138.65;15.12 
1449054139.25;13.12""" 
df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0]) 
df.index = pd.to_datetime(df.index, unit='s') 
100 loops, best of 3: 1.69 ms per loop

因此，使用to_datetime是超過2倍這個小數據集快，我希望這個規模比其他方法

好得多

來源

2015-12-06 21:01:31 EdChum

我不知道爲什麼，但與單位=''熊貓失去微秒精度（熊貓0.18.1）。傳遞'df.ts * 1000，unit ='ms''有幫助。 –

@MikhailKorobov你將不得不張貼原始碼和演示這個代碼，否則我不能發表評論 – EdChum

讀取與timestamp列的CSV，與熊貓

回答

相關問題