使用Pandas從Excel轉換爲HDF5

我想將Excel文檔的內容提取到熊貓數據框中，然後將該數據框寫入HDF5文件。要做到這一點，我已經做到了這一點：使用Pandas從Excel轉換爲HDF5

xls_df = pd.read_excel(fn_xls) 
xls_df.to_hdf(fn_h5, 'table', format='table', mode='w')

這將導致以下錯誤：

TypeError: Cannot serialize the column [Col1] because its data contents are [unicode] object dtype

我嘗試使用convert.objects（）從Excel文件中的數據幀，但是這並未工作（和convert.objects（）已棄用）。有沒有關於這方面的建議？

這裏是關於Excel文件的少量信息：

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 101 entries, 0 to 100 
Data columns (total 5 columns): 
Col1     101 non-null object 
Col2     101 non-null object 
Col3     94 non-null float64 
Col4     98 non-null object 
Col5     93 non-null float64 
dtypes: float64(2), object(3)

的第一和第二列是字符串，第四列擁有1串，但大多是整數，第三和第五列是整數。

來源

2016-09-02 PyNoob

顯示數據框的一些示例條目？ –

列「Col4」中的混合字符串和整數數據類型在以「表」格式轉換爲HDF5時導致錯誤。

要保存HDF5需要在COL4到彩車將數字轉換（或字符串，NAN）的「表」格式：

df["Col4"] = pd.to_numeric(df["Col4"], errors="coerce")

或列轉換一切字符串：

df["Col4"] = df["Col4"].astype(str)

或者使用允許列具有混合數據類型的「固定」hdf5格式。這將以python pickle格式保存混合數據類型列，並且當前會給出PerformanceWarning。

df.to_hdf(outpath, 'yourkey', format='fixed', mode='w')

來源

2016-09-04 07:54:26

使用Pandas從Excel轉換爲HDF5

回答

相關問題