2012-11-19 48 views
1

我不確定這是一個錯誤還是它的設計 - 也許我錯過了一些東西,並且ohlc聚合器不應該與數據框一起工作。也許這種行爲是有目的的,因爲除索引列和價格列之外的數據框可能會產生奇怪的結果?其他聚合器(均值,stdev等)使用數據幀。無論如何,我正試圖從這些數據中獲得OHLC,並且轉換爲時間序列似乎也不起作用。OHLC聚合器不能在熊貓上使用數據框?

下面是一個例子:

import pandas as pd 
rng = pd.date_range('1/1/2012', periods=1000, freq='S') 

ts = pd.Series(randint(0, 500, len(rng)), index=rng) 
df = pd.DataFrame(randint(0,500, len(rng)), index=rng) 

ts.resample('5Min', how='ohlc') # works great 
df.resample('5Min', how='ohlc') # throws a "NotImplementedError" 

newts = pd.TimeSeries(df) #am I missing an index command in this line? 
# the above line yields this error "TypeError: Only valid with DatetimeIndex or 
    PeriodIndex" 

Full NotImplementedError paste: 

NotImplementedError      Traceback (most recent call last) 
/home/jeff/<ipython-input-7-85a274cc0d8c> in <module>() 
----> 1 df.resample('5Min', how='ohlc') 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base) 
    231        fill_method=fill_method, convention=convention, 
    232        limit=limit, base=base) 
--> 233   return sampler.resample(self) 
    234 
    235  def first(self, offset): 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/tseries/resample.pyc in resample(self, obj) 
    66 
    67   if isinstance(axis, DatetimeIndex): 
---> 68    rs = self._resample_timestamps(obj) 
    69   elif isinstance(axis, PeriodIndex): 
    70    offset = to_offset(self.freq) 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/tseries/resample.pyc in _resample_timestamps(self, obj) 
    189    if len(grouper.binlabels) < len(axlabels) or self.how is not None: 
    190     grouped = obj.groupby(grouper, axis=self.axis) 
--> 191     result = grouped.aggregate(self._agg_method) 
    192    else: 
    193     # upsampling shortcut 


/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs) 
    1538   """ 
    1539   if isinstance(arg, basestring): 
-> 1540    return getattr(self, arg)(*args, **kwargs) 
    1541 
    1542   result = {} 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in ohlc(self) 
    384   For multiple groupings, the result index will be a MultiIndex 
    385   """ 
--> 386   return self._cython_agg_general('ohlc') 
    387 
    388  def nth(self, n): 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only) 
    1452 
    1453  def _cython_agg_general(self, how, numeric_only=True): 
-> 1454   new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only) 
    1455   return self._wrap_agged_blocks(new_blocks) 
    1456 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only) 
    1490     values = com.ensure_float(values) 
    1491 
-> 1492    result, _ = self.grouper.aggregate(values, how, axis=agg_axis) 
    1493    newb = make_block(result, block.items, block.ref_items) 
    1494    new_blocks.append(newb) 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, values, how, axis) 
    730     values = values.swapaxes(0, axis) 
    731    if arity > 1: 
--> 732     raise NotImplementedError 
    733    out_shape = (self.ngroups,) + values.shape[1:] 
    734 

NotImplementedError: 
+0

聽起來像它不是(還)實施... –

+0

這可能是這樣,海登。如果這是真的,我想我必須弄清楚如何正確地將我的數據幀轉換爲可重新採樣的時間序列。到目前爲止,我還沒有成功。 – Jeff

+1

我能夠通過使用此命令將我的數據框轉換爲時間序列來獲得所需的結果:「ts = pd.TimeSeries(df [0])」,然後我可以重新採樣時間序列。不像從數據框中直接完成那樣優雅,但它現在可行。 – Jeff

回答

3

您可以重新取樣通過一個單獨的列(因爲每一項都是一個時間序列):

In [9]: df[0].resample('5Min', how='ohlc') 
Out[9]: 
        open high low close 
2012-01-01 00:00:00 136 136 136 136 
2012-01-01 00:05:00 462 499 0 451 
2012-01-01 00:10:00 209 499 0 495 
2012-01-01 00:15:00 25 499 0 344 
2012-01-01 00:20:00 200 498 0 199 


In [10]: type(df[0]) 
Out[10]: pandas.core.series.TimeSeries 

這我不清楚這應該如何輸出一個更大的DataFrames(多列),但也許你可以製作一個Panel:

In [11]: newts = Panel(dict((col, df[col].resample('5Min', how='ohlc')) 
           for col in df.columns)) 

In [12]: newts[0] 
Out[12]: 
        open high low close 
2012-01-01 00:00:00 136 136 136 136 
2012-01-01 00:05:00 462 499 0 451 
2012-01-01 00:10:00 209 499 0 495 
2012-01-01 00:15:00 25 499 0 344 
2012-01-01 00:20:00 200 498 0 199 

注:也許有對重採樣數據幀規範輸出,這是尚未推行?

+0

重新採樣通過一個單獨的列是我正在尋找。完美的作品。謝謝,海登。 – Jeff