如何加快這個非常基本的Python腳本來抵消數字行

我有一個簡單的文本文件，其中包含以空格分隔的ASCII文本中的數字。如何加快這個非常基本的Python腳本來抵消數字行

150604849 
319865.301865 5810822.964432 -96.425797 -1610 
319734.172256 5810916.074753 -52.490280 -122 
319730.912949 5810918.098465 -61.864395 -171 
319688.240891 5810889.851608 -0.339890 -1790 
*<continues like this for millions of lines>*

基本上我要複製的第一行是，那麼對於所有的下列行欲偏移第一值（X），偏移第二值（y）的，離開第三值不變，並且偏移量和最後一個數字的一半。

我拼湊了下面的代碼作爲python學習經驗（道歉，如果它粗暴和冒犯，真的我的意思是沒有進攻），它的工作正常。但是，我使用的輸入文件大小爲幾GB，我想知道是否有辦法加快執行速度。目前爲740 MB的文件需要2分21秒

import glob 

#offset values 
offsetx = -306000 
offsety = -5806000 

files = glob.glob('*.pts') 
for file in files: 
    currentFile = open(file, "r") 
    out = open(file[:-4]+"_RGB_moved.pts", "w") 
    firstline = str(currentFile.readline()) 
    out.write(str(firstline.split()[0])) 

    while 1: 
     lines = currentFile.readlines(100000) 
     if not lines: 
      break 
     for line in lines: 
      out.write('\n') 
      words = line.split() 
      newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]    
      out.write(" ".join(newwords))

非常感謝

來源

2013-06-02 Matthew Walker

不要使用.readlines()。直接使用該文件作爲一個迭代器：

for file in files: 
    with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out: 
     firstline = next(currentFile) 
     out.write(firstline.split(None, 1)[0]) 

     for line in currentfile: 
      out.write('\n') 
      words = line.split() 
      newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050)/2)]    
      out.write(" ".join(newwords))

我還添加了一些Python的最佳實踐，而你並不需要再次打開words[2]成浮動，然後再返回一個字符串。

你也可以考慮使用csv模塊，它可以處理拆分和再結合線的C代碼：

import csv 

for file in files: 
    with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out: 
     reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE) 
     writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE) 

     out.writerow(next(reader)[0]) 

     for row in reader: 
      newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050)/2)]    
      out.writerow(newrow)

來源

2013-06-02 14:43:18

對不起，刪除了我以前的評論，因爲它是不完整的，我做了更多的測試。讓我再嘗試一次。感謝您的幫助。非常感謝。看來，非CSV代碼提供了14％的速度提升，這非常棒！原始碼 - 2:17 非CSV碼 - 2:00 CSV代碼 - 2:26 有人願意提供任何進一步的建議嗎？ –

沒錯，所以檢測報價所做的額外工作實際上會減慢速度。您可以嘗試設置'quoting = csv.QUOTE_NONE'（作爲讀寫器的關鍵字參數）以節省開銷。 –

使用CSV包。它可能比你的腳本更優化，並會簡化你的代碼。

來源

2013-06-02 14:48:09 MatthieuBizien

如何加快這個非常基本的Python腳本來抵消數字行

回答

相關問題