2016-10-05 21 views
1

我試圖從Netezza導出一個大文件(使用Netezza ODBC + pyodbc),這個解決方案拋出memoryError,如果我循環沒有「list」它非常慢。你有沒有殺死我的服務器/ python進程但可以運行得更快的中間解決方案的想法?用Python(內存不足)導出2Gb + SELECT到CSV

cursorNZ.execute(sql) 
archi = open("c:\test.csv", "w") 
lista = list(cursorNZ.fetchall()) 
for fila in lista: 
    registro = '' 
    for campo in fila: 
     campo = str(campo) 
     registro = registro+str(campo)+";" 
    registro = registro[:-1] 
    registro = registro.replace('None','NULL') 
    registro = registro.replace("'NULL'","NULL") 
    archi.write(registro+"\n") 

---- ----編輯

謝謝你,我想這一點: 其中, 「SQL」 爲查詢, cursorNZ是

connMy = pyodbc.connect(DRIVER=.....) 
cursorNZ = connNZ.cursor() 

chunk = 10 ** 5 # tweak this 
chunks = pandas.read_sql(sql, cursorNZ, chunksize=chunk) 
with open('C:/test.csv', 'a') as output: 
    for n, df in enumerate(chunks): 
     write_header = n == 0 
     df.to_csv(output, sep=';', header=write_header, na_rep='NULL') 

有這: AttributeError:'pyodbc.Cursor'對象沒有屬性'遊標' 任何想法?

+0

的可能的複製http://stackoverflow.com/questions/17707264/iterating- over-pyodbc-result-without-fetchall特別是對[fetchmany]的引用(http://code.google.com/p/pyodbc/wiki/Cursor#fetchmany)。 – tdelaney

+1

代替傳遞'read_sql'連接。我會編輯我的答案來反映這個 –

回答

4

請勿使用cursorNZ.fetchall()

相反,通過遊標直接循環:

with open("c:/test.csv", "w") as archi: # note the fixed '/' 
    cursorNZ.execute(sql) 
    for fila in cursorNZ: 
     registro = '' 
     for campo in fila: 
      campo = str(campo) 
      registro = registro+str(campo)+";" 
     registro = registro[:-1] 
     registro = registro.replace('None','NULL') 
     registro = registro.replace("'NULL'","NULL") 
     archi.write(registro+"\n") 

就個人而言,我只想用熊貓:

import pyodbc 
import pandas 

cnn = pyodbc.connect(DRIVER=.....) 
chunksize = 10 ** 5 # tweak this 
chunks = pandas.read_sql(sql, cnn, chunksize=chunksize) 

with open('C:/test.csv', 'a') as output: 
    for n, df in enumerate(chunks): 
     write_header = n == 0 
     df.to_csv(output, sep=';', header=write_header, na_rep='NULL') 
+3

'pandas'解決方案會佔用大量內存。 – tdelaney

+0

@tdelaney'read_sql'可以採用'chunksize'參數來幫助管理它。 –

+2

但是你仍然在內存中有一個2GB的數據幀。當然,在「熊貓」中有速度優勢,所以也許有一些外部循環可以構建更小的框架。 – tdelaney