如何快速從大文件創建數組？

我有例如：如何快速從大文件創建數組？

for line in IN.readlines(): 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3], mas[4]) 
     self.inetnums.append(row) 
    IN.close()

如果ffilesize == 120MB，腳本時間= 10秒。我可以減少這個時間嗎？

來源

2012-04-05 Bdfy

你正在閱讀一個120GB的文件到內存中的配置文件信息？你的機器有多少內存？ – interjay 2012-04-05 10:36:25

12GB /秒的硬盤是什麼？ – 2012-04-05 10:41:24

你可能會獲得一些速度，如果你使用列表綜合

inetnums=[(int(x) for x in line.rstrip('\n').split('\t')) for line in fin]

下面是兩個不同的版本

>>> def foo2(): 
    fin.seek(0) 
    inetnums=[] 
    for line in fin: 
     line = line.rstrip('\n') 
     mas = line.split('\t') 
     row = (int(mas[0]), int(mas[1]), mas[2], mas[3]) 
     inetnums.append(row) 


>>> def foo1(): 
    fin.seek(0) 
    inetnums=[[int(x) for x in line.rstrip('\n').split('\t')] for line in fin] 

>>> cProfile.run("foo1()") 
     444 function calls in 0.004 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.003 0.003 0.004 0.004 <pyshell#362>:1(foo1) 
     1 0.000 0.000 0.004 0.004 <string>:1(<module>) 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 


>>> cProfile.run("foo2()") 
     664 function calls in 0.006 CPU seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.005 0.005 0.006 0.006 <pyshell#360>:1(foo2) 
     1 0.000 0.000 0.006 0.006 <string>:1(<module>) 
     220 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     220 0.001 0.000 0.001 0.000 {method 'rstrip' of 'str' objects} 
     1 0.000 0.000 0.000 0.000 {method 'seek' of 'file' objects} 
     220 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects} 


>>>

來源

2012-04-05 10:56:33 Abhijit

除了通過刪除'readlines'獲得的速度之外，你真的會通過使用list comp來獲得一些速度嗎？在我看來，它似乎只是編寫相同代碼的另一種方式。 – jamylak 2012-04-05 12:00:43

@jamylak：考慮一個事實，即您不會在循環中多次調用append。我用cProfile的信息更新了我的答案。 – Abhijit 2012-04-05 14:51:02

刪除readlines()

只是做

for line in IN:

使用readlines要創建文件中的所有行的列表，然後訪問每一個，你不需要做。沒有它，for循環只是使用生成器，每次從文件返回一行。

來源

2012-04-05 10:32:01 jamylak

如何快速從大文件創建數組？

回答

相關問題