2013-11-15 127 views
0

我在熊貓身邊玩弄了一些財務時間序列數據,並且試圖對某些時間戳數據進行重新取樣。這是起始數據:Python熊貓時間序列重新取樣函數延長時間索引

start_data 

Out[12]: 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 749880 entries, 2012-07-06 03:00:00 to 2013-09-11 23:59:00 
Data columns (total 1 columns): 
TickMean 749880 non-null values 
dtypes: float64(1) 

start_data.TickMean 
Out[18]: 
2012-07-06 03:00:00 1.541194 
2012-07-06 03:01:00 1.541216 
2012-07-06 03:02:00 1.541201 
2012-07-06 03:03:00 1.541088 
2012-07-06 03:04:00 1.540999 
2012-07-06 03:05:00 1.541011 
2012-07-06 03:06:00 1.541090 
2012-07-06 03:07:00 1.541256 
2012-07-06 03:08:00 1.541341 
2012-07-06 03:09:00 1.541386 
2012-07-06 03:10:00 1.541511 
2012-07-06 03:11:00 1.541469 
2012-07-06 03:12:00 1.541506 
2012-07-06 03:13:00 1.541584 
2012-07-06 03:14:00 1.541453 
... 
2013-09-11 23:45:00 1.602015 
2013-09-11 23:46:00 1.602015 
2013-09-11 23:47:00 1.602015 
2013-09-11 23:48:00 1.602015 
2013-09-11 23:49:00 1.602015 
2013-09-11 23:50:00 1.602015 
2013-09-11 23:51:00 1.602015 
2013-09-11 23:52:00 1.602015 
2013-09-11 23:53:00 1.602015 
2013-09-11 23:54:00 1.602015 
2013-09-11 23:55:00 1.602015 
2013-09-11 23:56:00 1.602015 
2013-09-11 23:57:00 1.602015 
2013-09-11 23:58:00 1.602015 
2013-09-11 23:59:00 1.602015 
Name: TickMean, Length: 749880, dtype: float64 

,當我嘗試了40分鐘的重採樣,時間範圍擴大:

start_data = start_data.resample('40min') 

start_data 
Out[14]: 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 25344 entries, 2012-01-07 00:00:00 to 2013-12-10 23:20:00 
Freq: 40T 
Data columns (total 1 columns): 
TickMean 18749 non-null values 
dtypes: float64(1) 

start_data.TickMean 

Out[15]: 
2012-01-07 00:00:00 1.5706 
2012-01-07 00:40:00 1.5706 
2012-01-07 01:20:00 1.5706 
2012-01-07 02:00:00 1.5706 
2012-01-07 02:40:00 1.5706 
2012-01-07 03:20:00 1.5706 
2012-01-07 04:00:00 1.5706 
2012-01-07 04:40:00 1.5706 
2012-01-07 05:20:00 1.5706 
2012-01-07 06:00:00 1.5706 
2012-01-07 06:40:00 1.5706 
2012-01-07 07:20:00 1.5706 
2012-01-07 08:00:00 1.5706 
2012-01-07 08:40:00 1.5706 
2012-01-07 09:20:00 1.5706 
... 
2013-12-10 14:00:00 1.594563 
2013-12-10 14:40:00 1.594796 
2013-12-10 15:20:00 1.594766 
2013-12-10 16:00:00 1.593523 
2013-12-10 16:40:00 1.593171 
2013-12-10 17:20:00 1.593702 
2013-12-10 18:00:00 1.595145 
2013-12-10 18:40:00 1.595796 
2013-12-10 19:20:00 1.595527 
2013-12-10 20:00:00 1.595099 
2013-12-10 20:40:00 1.595060 
2013-12-10 21:20:00 1.595575 
2013-12-10 22:00:00 1.595575 
2013-12-10 22:40:00 1.595575 
2013-12-10 23:20:00 1.595575 
Freq: 40T, Name: TickMean, Length: 25344, dtype: float64 

我覺得我失去了一些東西明顯。它爲什麼這樣做?

快速編輯:我知道40分鐘的頻率很奇怪,但其他頻率有相同的效果。

編輯2:是的,這是愚蠢的。我認爲索引將被排序。編輯3:作爲任何遇到這樣奇怪問題的人的獎勵,我的日期數據是第一天,而不是第一個月。所以也扔掉了一切。這是使用dayfirst = True選項解決的。

ask_data.index = pd.to_datetime(ask_data.index, dayfirst=True) 

ask_data 
Out[34]: 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 749880 entries, 2012-06-07 03:00:00 to 2013-11-09 23:59:00 
Data columns (total 5 columns): 
Open  749880 non-null values 
High  749880 non-null values 
Low  749880 non-null values 
Close  749880 non-null values 
Volume 749880 non-null values 
dtypes: float64(5) 

ask_data.index.min() 
Out[35]: Timestamp('2012-06-07 03:00:00', tz=None) 

ask_data.index.max() 
Out[36]: Timestamp('2013-11-09 23:59:00', tz=None) 

回答

1

你確定你的索引是按順序嗎?你可以檢查通過:

print start_data.index.min(), start_data.index.max(), start_data.index.is_monotonic 
+0

是的,就是這樣。編輯添加,謝謝! – user1644030

相關問題