我使用python(版本3.4.4),熊貓(版本0.19.1)和sqlalchemy(版本1.1.4)爲了chunkwise讀取從一個大的SQL表,預處理這些塊並將它們寫入不同的SQL表中。 連續逐塊讀取pd.read_sql_query(verses_sql, conn, chunksize=10)
,其中pd
是大熊貓進口,verses_sql
是SQL查詢和conn
是DB-API連接,工作正常,如果我做的:如何chunkwise讀寫熊貓和sqlalchemy
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
conn = engine.connect()
verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]'''
for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10):
chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ")
chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ")
chunk['KA_Lang'] = chunk['KA_Lang'].str.lower()
print(chunk['KA_Lang'].head(1))
這裏的問題是:如果我嘗試在第二個SQL表中寫入預處理的塊chunk['KA_Lang']
,我們將其稱爲SecondTable
,只有第一個大塊的10個元素被傳遞。迭代停在那裏。這裏是適應代碼:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData
engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
conn = engine.connect()
verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]'''
for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10):
chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ")
chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ")
chunk['KA_Lang'] = chunk['KA_Lang'].str.lower()
print(chunk['KA_Lang'].head(1))
chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False)
conn.close()
如何從一個SQL表中連續讀取一個塊並將其寫入不同的SQL表?如果我包含:chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False)
,爲什麼通過所有塊的迭代會停止?