讀取數據文件不應該是瓶頸。下面的代碼在大約0.2秒讀36 MB,697997行文本文件我的機器上:
import time
start = time.clock()
with open('procmail.log', 'r') as f:
lines = f.readlines()
end = time.clock()
print 'Readlines time:', end-start
因爲它產生以下結果:
Readlines time: 0.1953125
注意,此代碼生成一個列表線條一氣呵成。
要知道你去過的地方,只需將你處理的行數寫入文件。然後如果您想再試一次,請閱讀所有行並跳過您已完成的行:
import os
# Raad the data file
with open('list.txt', 'r') as f:
lines = f.readlines()
skip = 0
try:
# Did we try earlier? if so, skip what has already been processed
with open('lineno.txt', 'r') as lf:
skip = int(lf.read()) # this should only be one number.
del lines[:skip] # Remove already processed lines from the list.
except:
pass
with open('lineno.txt', 'w+') as lf:
for n, line in enumerate(lines):
# Do your processing here.
lf.seek(0) # go to beginning of lf
lf.write(str(n+skip)+'\n') # write the line number
lf.flush()
os.fsync() # flush and fsync make sure the lf file is written.
數字是否是唯一的? – kalgasnik
您是否正在嘗試將每個數字寫入單獨的文件?如果是這樣,爲什 – root
你可以嘗試使用Postgres和pl/pgsql來執行數據庫本身的任何計算...... – moooeeeep