如何將OpenDocument電子表格轉換爲熊貓DataFrame？

Python庫pandas可以讀取Excel電子表格，並使用pandas.read_excel(file)命令將它們轉換爲pandas.DataFrame。在引擎蓋下，它使用xlrd庫，其中does not support ods文件。如何將OpenDocument電子表格轉換爲熊貓DataFrame？

是否有相當於ods文件的pandas.read_excel？如果不是，我如何才能對Open Document Formatted電子表格（ods文件）做同樣的事情？ ODF由LibreOffice和OpenOffice使用。

來源

2013-07-24 Lamps1829

如果可能，請從電子表格應用程序保存爲CSV，然後使用pandas.read_csv()。 IIRC，'ods'電子表格文件實際上是一個XML文件，其中也包含相當多的格式信息。因此，如果是表格數據，首先將這些原始數據提取到一箇中間文件（在本例中爲CSV），然後您可以使用其他程序（如Python/pandas）進行解析。

來源

2013-07-24 13:33:18

由於讀入。如果有更直接的東西，會很好，但我想這是一種可能性。 – Lamps1829

沒有比僅包含原始數據的文件更直接的了。這些文件必須採用特定的文件格式。這裏有二進制格式（比如NetCDF或者HDF5）和ascii格式，比如CSV。不幸的是，CSV不是一個真正的標準。儘管如此，CSV在大多數情況下是非常簡單的。 –

另一種選擇：read-ods-with-odfpy。該模塊將OpenDocument Spreadsheet作爲輸入，並返回一個列表，從中可以創建一個DataFrame。

來源

2013-07-24 17:42:06 Lamps1829

支持讀取Pandas中的Excel文件（包括xls和xlsx），請參閱read_excel命令。您可以使用OpenOffice將電子表格保存爲xlsx。轉換也可以在命令行自動完成，顯然，使用convert-to command line parameter。

從xlsx讀取數據可避免一些首先轉換爲CSV時可能遇到的問題（日期格式，數字格式，unicode）。

來源

2015-01-09 16:37:28

看來答案是否定的！而且我會描述讀取ODS仍然參差不齊的工具。如果你在POSIX，或許出口到XLSX利用大熊貓很不錯的進口工具XLSX前飛的戰略是一個選項：

unoconv -f xlsx -o tmp.xlsx myODSfile.ods

總之，我的代碼如下所示：

import pandas as pd 
import os 
if fileOlderThan('tmp.xlsx','myODSfile.ods'): 
    os.system('unoconv -f xlsx -o tmp.xlsx myODSfile.ods ') 
xl_file = pd.ExcelFile('tmp.xlsx') 
dfs = {sheet_name: xl_file.parse(sheet_name) 
      for sheet_name in xl_file.sheet_names} 
df=dfs['Sheet1']

此處fileOlderThan（）是一個函數（請參閱http://github.com/cpbl/cpblUtilities），如果tmp.xlsx不存在或比.ods文件舊，則返回true。

來源

2015-03-07 20:56:52 CPBL

您可以使用以下模塊在Python閱讀ODF（開放文檔格式）文檔：

使用ezodf，一個簡單的ODS-到數據幀轉換器可能如下所示：

import pandas as pd 
import ezodf 

doc = ezodf.opendoc('some_odf_spreadsheet.ods') 

print("Spreadsheet contains %d sheet(s)." % len(doc.sheets)) 
for sheet in doc.sheets: 
    print("-"*40) 
    print(" Sheet name : '%s'" % sheet.name) 
    print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols())) 

# convert the first sheet to a pandas.DataFrame 
sheet = doc.sheets[0] 
df_dict = {} 
for i, row in enumerate(sheet.rows()): 
    # row is a list of cells 
    # assume the header is on the first row 
    if i == 0: 
     # columns as lists in a dictionary 
     df_dict = {cell.value:[] for cell in row} 
     # create index for the column headers 
     col_index = {j:cell.value for j, cell in enumerate(row)} 
     continue 
    for j, cell in enumerate(row): 
     # use header instead of column index 
     df_dict[col_index[j]].append(cell.value) 
# and convert to a DataFrame 
df = pd.DataFrame(df_dict)

對pandas問題跟蹤器https://github.com/pydata/pandas/issues/2311請求了ODF電子表格（* .ods文件）支持，但它仍未實現。

ezodf被用於未完成的PR9070來實現熊貓的ODF支持。該PR現在已關閉（請閱讀PR進行技術討論），但它仍作爲fork中的實驗性功能提供。

來源

2016-03-23 14:21:53 davidovitch

工作得很好。你應該提供這樣的外部包（這取決於'ezodf'和'pandas'），這樣用戶終於可以擁有一個read_ods（）函數了！ – Antonello

這裏是一個快速和骯髒的劈它使用ezodf模塊：

import pandas as pd 
import ezodf 

def read_ods(filename, sheet_no=0, header=0): 
    tab = ezodf.opendoc(filename=filename).sheets[sheet_no] 
    return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]] 
         for col in tab.columns()})

測試：

In [92]: df = read_ods(filename=fn) 

In [93]: df 
Out[93]: 
    a b c 
0 1.0 2.0 3.0 
1 4.0 5.0 6.0 
2 7.0 8.0 9.0

注：所有其他有用的參數，如header，skiprows，index_col，parse_cols不是在實施此功能 - 隨時更新此問題，如果你想實施它們

來源

2017-02-19 18:16:10 MaxU

如果你只有幾個.ods文件可以讀取，我只需在openoffice中打開它並將其保存爲excel文件即可。如果你有很多的文件，你可以使用Linux中的unoconv command到的.ods文件轉換爲編程的.xls（with bash）

然後它真的很容易與pd.read_excel('filename.xls')

來源

2017-08-01 19:51:42 wordsforthewise

如何將OpenDocument電子表格轉換爲熊貓DataFrame？

回答

相關問題