我正在研究從本地驅動器讀取DBF文件並將數據加載到sql服務器表的程序。我對Python很綠,我發現了一些關於多線程的細節,其中大部分都是令人困惑的。讀取和插入的性能很慢,看我的CPU使用率,我有足夠的容量。我也在運行SSD。Python - 多線程幫助 - 讀取多個文件 - ETL到SQL服務器
此代碼將被擴展到大約400個拉鍊之間的大約20個DBF文件中。所以我們總共討論了8000個DBF文件。
我很難做到這一點。你能提供指針嗎?
這裏是我的代碼(這是一個有點混亂,但以後我會清理),
import os, pyodbc, datetime, shutil
from dbfread import DBF
from zipfile import ZipFile
# SQL Server Connection Test
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost\test;DATABASE=TEST_DBFIMPORT;UID=test;PWD=test')
cursor = cnxn.cursor()
dr = 'e:\\Backups\\dbf\\'
work = 'e:\\Backups\\work\\'
archive = 'e:\\Backups\\archive\\'
for r in os.listdir(dr):
curdate = datetime.datetime.now()
filepath = dr + r
process = work + r
arc = archive + r
pth = r.replace(".sss","")
zipfolder = work + pth
filedateunix = os.path.getctime(filepath)
filedateconverted=datetime.datetime.fromtimestamp(int(filedateunix)
).strftime('%Y-%m-%d %H:%M:%S')
shutil.move(filepath,process)
with ZipFile(process) as zf:
zf.extractall(zipfolder)
cursor.execute(
"insert into tblBackups(backupname, filedate, dateadded) values(?,?,?)",
pth, filedateconverted, curdate)
cnxn.commit()
for dirpath, subdirs, files in os.walk (zipfolder):
for file in files:
dateadded = datetime.datetime.now()
if file.endswith(('.dbf','.DBF')):
dbflocation = os.path.abspath(os.path.join(dirpath,file)).lower()
if dbflocation.__contains__("\\bk.dbf"):
table = DBF(dbflocation, lowernames=True, char_decode_errors='ignore')
for record in table.records:
rec1 = str(record['code'])
rec2 = str(record['name'])
rec3 = str(record['addr1'])
rec4 = str(record['addr2'])
rec5 = str(record['city'])
rec6 = str(record['state'])
rec7 = str(record['zip'])
rec8 = str(record['tel'])
rec9 = str(record['fax'])
cursor.execute(
"insert into tblbk(code,name,addr1,addr2,city,state,zip,tel,fax) values(?,?,?,?,?,?,?,?,?)",
rec1, rec2, rec3, rec4, rec5, rec6, rec7, rec8, rec9, rec10, rec11, rec12, rec13)
cnxn.commit()
if dbflocation.__contains__("\\cr.dbf"):
table = DBF(dbflocation, lowernames=True, char_decode_errors='ignore')
for record in table.records:
rec2 = str(record['cal_desc'])
rec3 = str(record['b_date'])
rec4 = str(record['b_time'])
rec5 = str(record['e_time'])
rec6 = str(record['with_desc'])
rec7 = str(record['recuruntil'])
rec8 = record['notes']
rec9 = dateadded
cursor.execute(
"insert into tblcalendar(cal_desc,b_date,b_time,e_time,with_desc,recuruntil,notes,dateadded) values(?,?,?,?,?,?,?,?)",
rec2, rec3, rec4, rec5, rec6, rec7, rec8, rec9)
cnxn.commit()
shutil.move(process, archive)
shutil.rmtree(zipfolder)
我想要的另一個選擇是可能更簡單的多處理。 – HMan06