我敢打賭你使用這樣的事情來創建你的店,
In [207]:
data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty'])
store = pd.HDFStore('borrar')
store.put('data', data, format='t')
如果再嘗試做一個select
確實是你你描述的錯誤,
In [208]:
store.select('data', 'shipmentid>0')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-211-5d0c4082cdcf> in <module>()
----> 1 store.select('data', 'shipmentid>0')
...
ValueError: The passed where expression: shipmentid>0
contains an invalid variable reference
all of the variable refrences must be a reference to
而是,你可以這樣創建它:
In [209]:
data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty'])
data.to_hdf('borrar2', 'data', append=True, mode='w', data_columns=['shipmentid', 'qty'])
In [210]:
pd.read_hdf('borrar2', 'data', where='shipmentid>0')
Out[210]:
shipmentid qty
1 0.778225 -1.008529
5 0.264075 -0.651268
7 0.908880 0.153306
(老實說,我不知道爲什麼它的工作的一種方式,另一種則沒有,我的猜測是,在第一個1,你不能指定的數據列。但是,這些東西可以讓你發瘋......)
編輯: 的代碼更新發布後,數據幀有MultiIndex
。類似的更新的代碼會是這樣的:
In [273]:
import pandas as pd
from pandas import *
import random
def createFrame():
data = {
('shipmentid',''):{1:1,2:2,3:3},
('qty',1):{1:5,2:5,3:5},
('qty',2):{1:6,2:6,3:6},
('qty',3):{1:7,2:7,3:7}
}
frame = pd.DataFrame(data)
return frame
frame = createFrame()
print(frame)
print('\n')
print(frame.info())
frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table')
pd.read_hdf('sample.h5','data', 'shipmentid == 2')
但我得到一個錯誤(我猜你會得到相同的):
qty shipmentid
1 2 3
1 5 6 7 1
2 5 6 7 2
3 5 6 7 3
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 1 to 3
Data columns (total 4 columns):
(qty, 1) 3 non-null int64
(qty, 2) 3 non-null int64
(qty, 3) 3 non-null int64
(shipmentid,) 3 non-null int64
dtypes: int64(4)
memory usage: 120.0 bytes
None
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-273-e10e811fc7c0> in <module>()
23 print(frame.info())
24
---> 25 frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table')
26 pd.read_hdf('sample.h5','data', 'shipmentid == 2')
.....
stack trace
.....
ValueError: cannot use a multi-index on axis [1] with data_columns ['shipmentid']
我已經瀏覽了一下,我不能提供一個解決方案爲了這。我的印象是通過查看code in github是否選項data_columns
不能與MultiIndex
組合使用。我能想到的唯一解決方案就是寫入HDFStore
(與您的代碼一樣),然後閱讀完整的數據框,無條件地執行搜索後續處理。那就是:
new_frame = store.get('data')
print new_frame[new_frame['shipmentid'] == 2]
<class 'pandas.io.pytables.HDFStore'>
File path: sample.h5
/data frame_table (typ->appendable,nrows->3,ncols->4,indexers->[index])
qty shipmentid
1 2 3
2 5 6 7 2
該問題似乎源於使用嵌套列。看到我剛剛添加的完整示例代碼。 – TraxusIV
更新了答案,但可能不再是答案。無論如何希望它有幫助 – lrnzcig