2
只想將某些Pandas DataFrames存檔在HDF5商店(.h5文件)中。以下是我正在使用的代碼。將Pandas DataFrames保存爲HDF5商店,各種錯誤
# Fake data over N runs
Data_N = []
for n in range(5):
Data_N.append(np.random.randn(5000,15,125))
# Create HDFStore object
store = pd.HDFStore('test.h5')
# For each run:
for n in range(len(Data_N)):
Data = Data_N[n]
# Pandas DataFrame for "flattened" fake data
Data_subDFs = []
nanbuff = np.nan*np.zeros((1,len(Data[0,0])))
for i in range(len(Data)):
Data_i = np.vstack((nanbuff,Data[i,:,:]))
Data_subDFs.append(pd.DataFrame(data = Data_i))
Data_DF = pd.concat(Data_subDFs)
# Row and column labels for the DataFrame
Data_rows = []
for i in range(len(Data)):
Data_rows.append(['Layer %d:' % (i+1)] + range(1,len(Data[0])+1))
Data_DF.index = sum(Data_rows,[])
Data_DF.columns = range(1,len(Data[0,0])+1)
# Put Pandas DataFrame into store
store.put('Data_DF_%d' % (n+1), Data_DF)
#store.put('Data_DF_%d' % (n+1), Data_DF, format='table')
#store.put('Data_DF_%d' % (n+1), Data_DF, format='table', data_columns=True)
# Save the HDF5 file
store.close()
這給出了以下的輸出:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->axis1] [items->None]
如果我使用看跌期權的第二個版本,它提供了:
TypeError: Passing an incorrect value to a table column. Expected a Col (or subc
lass) instance and got: "ObjectAtom()". Please make use of the Col(), or descend
ant, constructor to properly initialize columns.
如果我用賣出期權的第三個版本,它得出:
ValueError: cannot have non-object label DataIndexableCol
能
someon e請解釋不同的版本,爲什麼我不能保存我認爲是沒有酸洗的HDF5中有效的Pandas DataFrame?
如果有幫助,我不認爲我需要能夠追加DataFrame /商店。我只想要使用Pandas HDF5接口保存DF的最佳方式。
謝謝!
編輯1:
我更新後的代碼「對於每次運行:」這個
# For each run:
for run in range(len(Data_N)):
Data = Data_N[run]
l = len(Data)
m = len(Data[0])
n = len(Data[0,0])
# Pandas DataFrame for "flattened" fake data
Data_subDFs = []
for i in range(len(Data)):
Data_i = Data[i,:,:]
Data_subDFs.append(pd.DataFrame(data = Data_i))
Data_DF = pd.concat(Data_subDFs)
# Row and column labels for the DataFrame
L1 = np.zeros((l*m,1), dtype=object) # Layer number
L2 = np.zeros((l*m,1), dtype=object) # Row number
for i in range(l):
for j in range(m):
L1[i*m + j,0] = 'Layer %d' % (i+1)
L2[i*m + j,0] = '%d' % (j+1)
Data_DF.index = np.hstack((L1,L2))
Data_DF.columns = range(1,n+1)
# Put Pandas DataFrame into store
store.put('Data_DF_%d' % (run+1), Data_DF)
#store.put('Data_DF_%d' % (run+1), Data_DF, format='table')
#store.put('Data_DF_%d' % (run+1), Data_DF, format='table', data_columns=True)
但是,這給出了同樣的警告或錯誤,爲每個放線。
EDIT 2(這個工作!):
# For each run:
for run in range(len(Data_N)):
Data = Data_N[run]
l = len(Data)
m = len(Data[0])
n = len(Data[0,0])
# Pandas DataFrame for "flattened" fake data
Data_DF = pd.DataFrame(Data.reshape(l*m,n))
# Layer and row labels
layers = np.arange(1,l+1)
rows = np.arange(1,m+1)
# Pandas multi-index
mindex = pd.MultiIndex.from_product([layers,rows], names=['Layer','Row'])
# DataFrame multi-index and column labels
Data_DF.index = mindex
Data_DF.columns = range(1,n+1)
# Put Pandas DataFrame into store
store.put('Data_DF_%d' % (run+1), Data_DF)
#store.put('Data_DF_%d' % (run+1), Data_DF, format='table')
#store.put('Data_DF_%d' % (run+1), Data_DF, format='table', data_columns=True)
第三放線仍然給出了同樣的錯誤,但由於第二線工程,我會假設,第三行是剛在這種情況下一個無效的命令。
第二條生產線比第一條生產線快得多,並且都比酸洗路線快得多。謝謝!
嗨,感謝您的建議。我編輯了我的帖子,嘗試使用多索引。但它沒有奏效。還有什麼建議?我錯誤地執行了嗎?謝謝 – leka0024
謝謝!這工作(再次更新我原來的帖子)。我正確地投了你的答案,並試圖將它投票,但我沒有足夠的信用。再次感謝! – leka0024