使用Pyarrow將.parquet文件轉換爲CSV

我有一個.parquet文件，我使用PyArrow。使用Pyarrow將.parquet文件轉換爲CSV

import pyarrow.parquet as pq 
import pandas as pd 
filepath = "xxx" # This contains the exact location of the file on the server 
from pandas import Series, DataFrame 
table = pq.read_table(filepath)

執行table.shape返回(39014 rows, 19 columns)：我使用下面的代碼轉換的.parquet文件插入表中。

表的模式是：

col1: int64 not null 
col2: string not null 
col3: string not null 
col4: int64 not null 
col5: string not null 
col6: string not null 
col7: int64 not null 
col8: int64 not null 
col9: string not null 
col10: string not null 
col11: string not null 
col12: string not null 
col13: string not null 
col14: string not null 
col15: string not null 
col16: int64 not null 
col17: int64 not null 
col18: int64 not null 
col19: string not null

在進行p = table.to_pandas()我得到以下錯誤：

ImportError: cannot import name RangeIndex

如何轉換這種拼花文件轉換成一個數據幀，然後CSV？請幫忙。謝謝。

來源

2017-05-05 ZeusofCode

其中pyarrow和熊貓的版本您使用的？它們可能不兼容。在最後幾天，Pandas發佈了一個新版本，PyArrow也將發佈一個新版本。現在可能有助於升級/降級您的Pandas安裝，直到新的pyarrow發佈下降。 – xhochy

嘗試'從pandas導入RangeIndex'並用輸出更新你的問題 –

嘗試以下操作：

import pyarrow as pa 
    import pyarrow.parquet as pq 
    import pandas as pd 
    import pyodbc 

    def read_pyarrow(path, nthreads=1): 
    return pq.read_table(path, nthreads=nthreads).to_pandas() 


    path = './test.parquet' 

    df1 = read_pyarrow(path) 

    df1.to_csv(
    './test.csv', 
    sep='|', 
    index=False, 
    mode='w', 
    line_terminator='\n', 
    encoding='utf-8')

來源

2018-02-09 21:44:16 SSingh

使用Pyarrow將.parquet文件轉換爲CSV

回答

相關問題