有沒有一種方法可以從除了使用for循環以外的文件中獲取輸入? 我使用,從python輸入文件到列表中
data = fileinput.input()
c = [int(i) for i in data]
c.sort()
但對於數據量非常大的,需要很長時間來處理。 輸入的格式爲,
58457907
37850775
19743393
70718573
....
有沒有一種方法可以從除了使用for循環以外的文件中獲取輸入? 我使用,從python輸入文件到列表中
data = fileinput.input()
c = [int(i) for i in data]
c.sort()
但對於數據量非常大的,需要很長時間來處理。 輸入的格式爲,
58457907
37850775
19743393
70718573
....
使用readlines
和map
使用with
打開該文件似乎與200行的文件的測試更有效。
In [3]: %%timeit
with open("in.txt",'rb') as f:
lines = map(int,f)
lines.sort()
...:
10000 loops, best of 3: 183 µs per loop
In [5]: %%timeit
data = fileinput.input("in.txt")
c = [int(i) for i in data]
c.sort()
...:
1000 loops, best of 3: 443 µs per loop
我不知道'lines = map(int,f)'是否更快。 – 2014-09-24 17:55:53
@Robᵩ,字面上相同的時間,但可能是一個更好的想法 – 2014-09-24 17:57:53
在我的電腦上,'lines = sorted(itertools.imap(int,f))'是最快的,儘管'lines = sorted(int(x)for x in f )'接近。我討厭使用'map'。 – 2014-09-24 18:01:20
如果我創建了一個 '大' 文件:
from random import randint
with open('/tmp/nums.txt', 'w') as fout:
a,b=100002/10000, 100002*10000
for i in range(100002):
fout.write('{}\n'.format(randint(a,b)))
我能讀懂它,將其轉換爲整數,且按地方數據這樣的:
with open('/tmp/nums.txt') as fin:
nums=[int(e) for e in fin]
nums.sort()
總我的電腦上執行此操作的時間爲50毫秒。很長一段時間是50毫秒嗎?
比較正式的時機:
def f1():
with open('/tmp/nums.txt') as fin:
nums=[int(e) for e in fin]
nums.sort()
return nums
def f2():
with open('/tmp/nums.txt') as fin:
return sorted(map(int, fin))
def f3():
with open('/tmp/nums.txt') as fin:
nums=list(map(int, fin))
nums.sort()
return nums
if __name__ =='__main__':
import timeit
import sys
if sys.version_info.major==2:
from itertools import imap as map
result=[]
for f in (f1, f2, f3):
fn=f.__name__
fs="f()"
ft=timeit.timeit(fs, setup="from __main__ import f", number=3)
r=eval(fs)
result.append((ft, fn, str(r[0:5])+'...'+str(r[-6:-1])))
result.sort(key=lambda t: t[0])
for i, t in enumerate(result):
ft, fn, r = t
if i==0:
fr='{}: {:.4f} secs is fastest\n\tf(x)={}\n========'.format(fn, ft, r)
else:
t1=result[0][0]
dp=(ft-t1)/t1
fr='{}: {:.4f} secs - {} is {:.2%} faster\n\tf(x)={}'.format(fn, ft, result[0][1], dp, r)
print(fr)
你可以看到,它們之間的差異並不巨大(除PyPy其中f3顯然有優勢):
的Python 2.7.8:
f3: 0.2630 secs is fastest
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.2641 secs - f3 is 0.41% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2779 secs - f3 is 5.67% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
Python 3.4.1:
f2: 0.1873 secs is fastest
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f3: 0.1881 secs - f2 is 0.41% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2071 secs - f2 is 10.59% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
PyPy:
f3: 0.1300 secs is fastest
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.1428 secs - f3 is 9.81% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2223 secs - f3 is 70.94% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
PyPy3:
f3: 0.2483 secs is fastest
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.2588 secs - f3 is 4.23% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2878 secs - f3 is 15.88% faster
f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
你基本上是處理文件的3倍 - 你真的需要把它到底排序? - 嘗試打開文件並讀取它,第一行讀取整個文件,然後處理每一行,然後對其進行排序 - 難怪需要很長時間 – gkusner 2014-09-24 17:44:19
定義「大」? – dawg 2014-09-24 17:51:46
幾乎任何構造都有一個隱式循環。你爲什麼避免顯式循環? – dawg 2014-09-24 17:52:36