2014-04-30 35 views
2

考慮到美國市場的時間:熊貓數據幀失敗的指標,但系列成功

In [220]: market_hours = pandas.date_range(date + ' 09:30:00', date + ' 16:00:00', freq='15min', tz='US/Eastern').tz_convert('UTC') 

In [221]: market_hours 
Out[221]: 
<class 'pandas.tseries.index.DatetimeIndex'> 
[2014-04-29 13:30:00+00:00, ..., 2014-04-29 20:00:00+00:00] 
Length: 27, Freq: 15T, Timezone: UTC 

我可以resample()單場,並限制這些市場的時間:

In [222]: df.set_index('localtime')['size'].resample('15min', how='sum')[market_hours] 
Out[222]: 
2014-04-29 13:30:00+00:00 1093142 
2014-04-29 13:45:00+00:00  556664 
2014-04-29 14:00:00+00:00  467662 
2014-04-29 14:15:00+00:00  460966 
2014-04-29 14:30:00+00:00  275805 
2014-04-29 14:45:00+00:00  192709 
2014-04-29 15:00:00+00:00  226375 
2014-04-29 15:15:00+00:00  175065 
2014-04-29 15:30:00+00:00  181047 
2014-04-29 15:45:00+00:00  129644 
2014-04-29 16:00:00+00:00  193330 
2014-04-29 16:15:00+00:00  170046 
2014-04-29 16:30:00+00:00  130674 
2014-04-29 16:45:00+00:00  107118 
2014-04-29 17:00:00+00:00  156699 
2014-04-29 17:15:00+00:00  153912 
2014-04-29 17:30:00+00:00  180449 
2014-04-29 17:45:00+00:00  223318 
2014-04-29 18:00:00+00:00  211324 
2014-04-29 18:15:00+00:00  152374 
2014-04-29 18:30:00+00:00  121876 
2014-04-29 18:45:00+00:00  90891 
2014-04-29 19:00:00+00:00  138222 
2014-04-29 19:15:00+00:00  167571 
2014-04-29 19:30:00+00:00  264658 
2014-04-29 19:45:00+00:00  492528 
2014-04-29 20:00:00+00:00  8354 
Freq: 15T, Name: size, dtype: int64 

但是,如果我嘗試resample()一組字段,我得到一個錯誤:

In [223]: df.set_index('localtime')[['size']].resample('15min', how='sum')[market_hours] 
... 

KeyError: "['2014-04-29T09:30:00.000000000-0400' '2014-04-29T09:45:00.000000000-0400'\n '2014-04-29T10:00:00.000000000-0400' '2014-04-29T10:15:00.000000000-0400'\n '2014-04-29T10:30:00.000000000-0400' '2014-04-29T10:45:00.000000000-0400'\n '2014-04-29T11:00:00.000000000-0400' '2014-04-29T11:15:00.000000000-0400'\n '2014-04-29T11:30:00.000000000-0400' '2014-04-29T11:45:00.000000000-0400'\n '2014-04-29T12:00:00.000000000-0400' '2014-04-29T12:15:00.000000000-0400'\n '2014-04-29T12:30:00.000000000-0400' '2014-04-29T12:45:00.000000000-0400'\n '2014-04-29T13:00:00.000000000-0400' '2014-04-29T13:15:00.000000000-0400'\n '2014-04-29T13:30:00.000000000-0400' '2014-04-29T13:45:00.000000000-0400'\n '2014-04-29T14:00:00.000000000-0400' '2014-04-29T14:15:00.000000000-0400'\n '2014-04-29T14:30:00.000000000-0400' '2014-04-29T14:45:00.000000000-0400'\n '2014-04-29T15:00:00.000000000-0400' '2014-04-29T15:15:00.000000000-0400'\n '2014-04-29T15:30:00.000000000-0400' '2014-04-29T15:45:00.000000000-0400'\n '2014-04-29T16:00:00.000000000-0400'] not in index" 

有沒有辦法訪問t他在日期範圍內產生DataFrame?這似乎與時區沒有任何關係。

回答

1

在第一種情況下,您正在索引一個系列。在第二種情況下(使用df[['size']].resample(..,請注意雙方括號),您正在使用DataFrame。
DataFrame上的基本索引(df[labels])將索引列,而不是行(請參閱http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics)。出於這個原因,你會得到標籤不在(列)索引中的錯誤。

爲了克服這一點,你可以使用loc(假設result是重採樣的結果):

result.loc[market_hours, :]