2013-09-27 107 views
0

當我對某些數據進行重新採樣時,我有一個熊貓丟掉第一行的問題。請看下面的例子。請注意,如果將前一個時間戳前移1秒,則按預期工作。重新採樣後丟失第一行

我使用熊貓0.10.1

import pandas as pd 

from datetime import datetime 
from StringIO import StringIO 


f = StringIO('''\ 
time,value 
2011-06-03 00:00:05,0 
2011-06-03 00:01:05,1 
2011-06-03 00:02:05,2 
''') 

series = pd.read_csv(f, parse_dates=True, index_col=0)['value'] 

print series 
# time 
# 2011-06-03 00:00:05 0 
# 2011-06-03 00:01:05 1 
# 2011-06-03 00:02:05 2 
# Name: value 

# Problem resampling: 1st sample is missing 

print series.resample('s') 
# time 
# 2011-06-03 00:00:06 NaN 
# 2011-06-03 00:00:07 NaN 
# 2011-06-03 00:00:08 NaN 
# 2011-06-03 00:00:09 NaN 
# ... 
# 2011-06-03 00:01:52 NaN 
# 2011-06-03 00:02:03 NaN 
# 2011-06-03 00:02:04 NaN 
# 2011-06-03 00:02:05  2 
# 2011-06-03 00:02:06 NaN 
# Freq: S, Name: value, Length: 121 
+0

當我運行此代碼時,輸​​出的第一行是「2011-06-03 00:00:05 0」,即第一個樣本不是失蹤。也許這是熊貓早期版本的問題(儘管我沒有聽說過)。你能告訴我們這個版本嗎? ''pd .__ version__''' –

+0

謝謝@DanAllan,我使用的是0.10.1,看起來像是在0.12中修復的。 – user1802187

回答

0

默認爲關閉PARM在0.11被改變,見here。我不知道那裏是否有bug。您可以嘗試指定關閉的間隔。

當前熊貓版本是0.12(0.13即將推出)。最好的選擇就是升級。

從0.12起。看起來沒問題。默認關閉='左'

In [11]: df 
Out[11]: 
        value 
time      
2011-06-03 00:00:05  0 
2011-06-03 00:01:05  1 
2011-06-03 00:02:05  2 

In [12]: df.index 
Out[12]: 
<class 'pandas.tseries.index.DatetimeIndex'> 
[2011-06-03 00:00:05, ..., 2011-06-03 00:02:05] 
Length: 3, Freq: None, Timezone: None 

In [13]: df.resample('1s') 
Out[13]: 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 121 entries, 2011-06-03 00:00:05 to 2011-06-03 00:02:05 
Freq: S 
Data columns (total 1 columns): 
value 3 non-null values 
dtypes: float64(1) 

In [14]: df.resample('1s').head() 
Out[14]: 
        value 
time      
2011-06-03 00:00:05  0 
2011-06-03 00:00:06 NaN 
2011-06-03 00:00:07 NaN 
2011-06-03 00:00:08 NaN 
2011-06-03 00:00:09 NaN