從python輸入文件到列表中

有沒有一種方法可以從除了使用for循環以外的文件中獲取輸入？我使用，從python輸入文件到列表中

data = fileinput.input() 
c = [int(i) for i in data] 
c.sort()

但對於數據量非常大的，需要很長時間來處理。輸入的格式爲，

來源

2014-09-24 Abhishek Sharma

你基本上是處理文件的3倍 - 你真的需要把它到底排序？ - 嘗試打開文件並讀取它，第一行讀取整個文件，然後處理每一行，然後對其進行排序 - 難怪需要很長時間 – gkusner 2014-09-24 17:44:19

定義「大」？ – dawg 2014-09-24 17:51:46

幾乎任何構造都有一個隱式循環。你爲什麼避免顯式循環？ – dawg 2014-09-24 17:52:36

使用readlines和map使用with打開該文件似乎與200行的文件的測試更有效。

In [3]: %%timeit 
with open("in.txt",'rb') as f: 
    lines = map(int,f) 
    lines.sort() 
    ...: 
10000 loops, best of 3: 183 µs per loop 


In [5]: %%timeit 
data = fileinput.input("in.txt") 
c = [int(i) for i in data] 
c.sort() 
    ...: 
1000 loops, best of 3: 443 µs per loop

來源

2014-09-24 17:48:00

我不知道'lines = map（int，f）'是否更快。 – 2014-09-24 17:55:53

@Robᵩ，字面上相同的時間，但可能是一個更好的想法 – 2014-09-24 17:57:53

在我的電腦上，'lines = sorted（itertools.imap（int，f））'是最快的，儘管'lines = sorted（int（x）for x in f ）'接近。我討厭使用'map'。 – 2014-09-24 18:01:20

如果我創建了一個 '大' 文件：

from random import randint 

with open('/tmp/nums.txt', 'w') as fout: 
    a,b=100002/10000, 100002*10000 
    for i in range(100002): 
     fout.write('{}\n'.format(randint(a,b)))

我能讀懂它，將其轉換爲整數，且按地方數據這樣的：

with open('/tmp/nums.txt') as fin:  
    nums=[int(e) for e in fin] 
    nums.sort()

總我的電腦上執行此操作的時間爲50毫秒。很長一段時間是50毫秒嗎？

比較正式的時機：

def f1(): 
    with open('/tmp/nums.txt') as fin:  
     nums=[int(e) for e in fin] 
     nums.sort() 
    return nums 

def f2(): 
    with open('/tmp/nums.txt') as fin: 
     return sorted(map(int, fin)) 

def f3(): 
    with open('/tmp/nums.txt') as fin: 
     nums=list(map(int, fin)) 
     nums.sort()  
    return nums  

if __name__ =='__main__': 
    import timeit  
    import sys 
    if sys.version_info.major==2: 
     from itertools import imap as map 

    result=[]  
    for f in (f1, f2, f3): 
     fn=f.__name__ 
     fs="f()" 
     ft=timeit.timeit(fs, setup="from __main__ import f", number=3) 
     r=eval(fs) 
     result.append((ft, fn, str(r[0:5])+'...'+str(r[-6:-1])))   

    result.sort(key=lambda t: t[0])  

    for i, t in enumerate(result): 
     ft, fn, r = t 
     if i==0: 
      fr='{}: {:.4f} secs is fastest\n\tf(x)={}\n========'.format(fn, ft, r) 
     else: 
      t1=result[0][0] 
      dp=(ft-t1)/t1 
      fr='{}: {:.4f} secs - {} is {:.2%} faster\n\tf(x)={}'.format(fn, ft, result[0][1], dp, r) 

     print(fr)

你可以看到，它們之間的差異並不巨大（除PyPy其中f3顯然有優勢）：

的Python 2.7.8：

f3: 0.2630 secs is fastest 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
======== 
f2: 0.2641 secs - f3 is 0.41% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
f1: 0.2779 secs - f3 is 5.67% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

Python 3.4.1：

f2: 0.1873 secs is fastest 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
======== 
f3: 0.1881 secs - f2 is 0.41% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
f1: 0.2071 secs - f2 is 10.59% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

PyPy：

f3: 0.1300 secs is fastest 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
======== 
f2: 0.1428 secs - f3 is 9.81% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
f1: 0.2223 secs - f3 is 70.94% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

PyPy3：

f3: 0.2483 secs is fastest 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
======== 
f2: 0.2588 secs - f3 is 4.23% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131] 
f1: 0.2878 secs - f3 is 15.88% faster 
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

來源

2014-09-24 18:02:36 dawg

使用fileinput.FileInput而不是打開在一些測試中增加約65％的時間我做了。 – tdelaney 2014-09-24 18:35:34

是啊，它快得多，我的前一個..謝謝.. 但在這裏我使用輸入文件..將使用參數明確提供.. – 2014-09-24 18:54:52

@AbhishekSharma：你的程序是通過stdin提供文件名還是100,002數字？ Fileinput支持/兩者。 – dawg 2014-09-24 20:26:29

從python輸入文件到列表中

回答

相關問題