嘗試一行在時間上的文件工作:
lowered = []
with open('tweets.txt', 'r') as handle:
for line in handle:
# keep accumulating the results ...
lowered.append(line.lower())
# or just dump the to stdout right away
print(line)
for line in lowered:
# print or write to file or whatever you require
這樣,你降低了內存開銷,其中,對於大文件的情況下可能會導致交換和殺死性能。
這裏有一個文件中的一些基準測試與約1M線路:
# (1) real 0.223 user 0.195 sys 0.026 pcpu 98.71
with open('medium.txt') as handle:
for line in handle:
pass
# (2) real 0.295 user 0.262 sys 0.025 pcpu 97.21
with open('medium.txt') as handle:
for i, line in enumerate(handle):
pass
print(i) # 1031124
# (3) real 21.561 user 5.072 sys 3.530 pcpu 39.89
with open('medium.txt') as handle:
for i, line in enumerate(handle):
print(line.lower())
# (4) real 1.702 user 1.605 sys 0.089 pcpu 99.50
lowered = []
with open('medium.txt') as handle:
for i, line in enumerate(handle):
lowered.append(line.lower())
# (5) real 2.307 user 1.983 sys 0.159 pcpu 92.89
lowered = []
with open('medium.txt', 'r') as handle:
for i, line in enumerate(handle):
lowered.append(line.lower())
with open('lowered.txt', 'w') as handle:
for line in lowered:
handle.write(line)
你也可以迭代超過兩個文件一次:
# (6) real 1.944 user 1.666 sys 0.115 pcpu 91.59
with open('medium.txt', 'r') as src, open('lowered.txt', 'w') as sink:
for i, line in enumerate(src):
sink.write(line.lower())
結果如表:
# (1) noop 0.223
# (2) w/ enumerate 0.295
# (4) list buffer 1.702
# (6) on-the-fly 1.944
# (5) r -> list buffer -> w 2.307
# (3) stdout print 21.561
這可能會有所幫助:http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python –