2015-04-07 139 views
1

選擇我有以下的數據幀,存儲在一個HDFStore對象作爲frame_table稱爲數據:熊貓HDFStore從嵌套列

 shipmentid qty    
catid    1 2 3 4 5 
0    0 0 0 0 0 0 
1    1 0 0 0 2 0 
2    2 2 0 0 0 0 
3    3 0 4 0 0 0 
0    0 0 0 0 0 0 

我想做store.select('data','shipmentid==2'),但我得到的錯誤「shipmentid」沒有定義:

ValueError: The passed where expression: shipmentid==2 
      contains an invalid variable reference 
      all of the variable refrences must be a reference to 
      an axis (e.g. 'index' or 'columns'), or a data_column 
      The currently defined references are: columns,index 

什麼是寫這個選擇語句的正確方法?

編輯:添加代碼示例

import pandas as pd 
from pandas import * 
import random 

def createFrame(): 
    data = { 
      ('shipmentid',''):{1:1,2:2,3:3}, 
      ('qty',1):{1:5,2:5,3:5}, 
      ('qty',2):{1:6,2:6,3:6}, 
      ('qty',3):{1:7,2:7,3:7} 
      } 
    frame = pd.DataFrame(data) 

    return frame 

def createStore(): 
    store = pd.HDFStore('sample.h5',format='table') 
    return store  

frame = createFrame() 
print(frame) 
print('\n') 
print(frame.info()) 

store = createStore() 
store.put('data',frame,format='t') 
print('\n') 
print(store) 

results = store.select('data','shipmentid == 2') 

store.close() 

回答

3

我敢打賭你使用這樣的事情來創建你的店,

In [207]: 

data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty']) 
store = pd.HDFStore('borrar') 
store.put('data', data, format='t') 

如果再嘗試做一個select確實是你你描述的錯誤,

In [208]: 

store.select('data', 'shipmentid>0') 

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-211-5d0c4082cdcf> in <module>() 
----> 1 store.select('data', 'shipmentid>0') 

... 

ValueError: The passed where expression: shipmentid>0 
      contains an invalid variable reference 
      all of the variable refrences must be a reference to 

而是,你可以這樣創建它:

In [209]: 

data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty']) 
data.to_hdf('borrar2', 'data', append=True, mode='w', data_columns=['shipmentid', 'qty']) 
In [210]: 

pd.read_hdf('borrar2', 'data', where='shipmentid>0') 
Out[210]: 
shipmentid qty 
1 0.778225 -1.008529 
5 0.264075 -0.651268 
7 0.908880 0.153306 

(老實說,我不知道爲什麼它的工作的一種方式,另一種則沒有,我的猜測是,在第一個1,你不能指定的數據列。但是,這些東西可以讓你發瘋......)

編輯: 的代碼更新發布後,數據幀有MultiIndex。類似的更新的代碼會是這樣的:

In [273]: 

import pandas as pd 
from pandas import * 
import random 

def createFrame(): 
    data = { 
      ('shipmentid',''):{1:1,2:2,3:3}, 
      ('qty',1):{1:5,2:5,3:5}, 
      ('qty',2):{1:6,2:6,3:6}, 
      ('qty',3):{1:7,2:7,3:7} 
      } 
    frame = pd.DataFrame(data) 

    return frame 

frame = createFrame() 
print(frame) 
print('\n') 
print(frame.info()) 

frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table') 
pd.read_hdf('sample.h5','data', 'shipmentid == 2') 

但我得到一個錯誤(我猜你會得到相同的):

qty  shipmentid 
    1 2 3   
1 5 6 7   1 
2 5 6 7   2 
3 5 6 7   3 


<class 'pandas.core.frame.DataFrame'> 
Int64Index: 3 entries, 1 to 3 
Data columns (total 4 columns): 
(qty, 1)   3 non-null int64 
(qty, 2)   3 non-null int64 
(qty, 3)   3 non-null int64 
(shipmentid,) 3 non-null int64 
dtypes: int64(4) 
memory usage: 120.0 bytes 
None 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-273-e10e811fc7c0> in <module>() 
    23 print(frame.info()) 
    24 
---> 25 frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table') 
    26 pd.read_hdf('sample.h5','data', 'shipmentid == 2') 
..... 
stack trace 
..... 
ValueError: cannot use a multi-index on axis [1] with data_columns ['shipmentid'] 

我已經瀏覽了一下,我不能提供一個解決方案爲了這。我的印象是通過查看code in github是否選項data_columns不能與MultiIndex組合使用。我能想到的唯一解決方案就是寫入HDFStore(與您的代碼一樣),然後閱讀完整的數據框,無條件地執行搜索後續處理。那就是:

new_frame = store.get('data') 
print new_frame[new_frame['shipmentid'] == 2] 



<class 'pandas.io.pytables.HDFStore'> 
File path: sample.h5 
/data   frame_table (typ->appendable,nrows->3,ncols->4,indexers->[index]) 
    qty  shipmentid 
    1 2 3   
2 5 6 7   2 
+0

該問題似乎源於使用嵌套列。看到我剛剛添加的完整示例代碼。 – TraxusIV

+0

更新了答案,但可能不再是答案。無論如何希望它有幫助 – lrnzcig