2016-12-06 30 views
0

我使用python(版本3.4.4),熊貓(版本0.19.1)和sqlalchemy(版本1.1.4)爲了chunkwise讀取從一個大的SQL表,預處理這些塊並將它們寫入不同的SQL表中。 連續逐塊讀取pd.read_sql_query(verses_sql, conn, chunksize=10),其中pd是大熊貓進口,verses_sql是SQL查詢和conn是DB-API連接,工作正常,如果我做的:如何chunkwise讀寫熊貓和sqlalchemy

import pandas as pd 
from sqlalchemy import create_engine 

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>') 
conn = engine.connect() 

verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]''' 

for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10): 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ") 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ") 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.lower() 
    print(chunk['KA_Lang'].head(1)) 

這裏的問題是:如果我嘗試在第二個SQL表中寫入預處理的塊chunk['KA_Lang'],我們將其稱爲SecondTable,只有第一個大塊的10個元素被傳遞。迭代停在那裏。這裏是適應代碼:

import pandas as pd 
from sqlalchemy import create_engine 
from sqlalchemy import Table, Column, Integer, String, MetaData 

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>') 
conn = engine.connect() 

verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]''' 

for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10): 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ") 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ") 
    chunk['KA_Lang'] = chunk['KA_Lang'].str.lower() 
    print(chunk['KA_Lang'].head(1)) 

    chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False) 

conn.close() 

如何從一個SQL表中連續讀取一個塊並將其寫入不同的SQL表?如果我包含:chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False),爲什麼通過所有塊的迭代會停止?

回答

1

經過幾天的嘗試不同的解決方法,我解決了這個問題。這很容易。用於從一個SQL表連續地讀出一個塊並將其寫入到不同的SQL表中的兩個不同的連接需要被定義:

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>') 
engine1 = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>') 
conn = engine.connect() 
conn1 = engine1.connect() 

的代碼,其中chunk被寫入第二表的行,需要調整到:

chunk.to_sql('<SecondTable>', conn1, if_exists= 'append', index= False) 

完成!