優化或加速從.xy文件讀取excel

我有幾個.xy文件（2列x和y值）。我一直在嘗試讀取它們並將「y」值粘貼到單個excel文件中（所有這些文件中的「x」值都是相同的）。我到目前爲止的代碼逐個讀取文件，但其速度非常慢（每個文件大約需要20秒）。我有很多.xy文件，時間大大增加。我到目前爲止的代碼是：優化或加速從.xy文件讀取excel

import os,fnmatch,linecache,csv 
from openpyxl import Workbook 

wb = Workbook() 
ws = wb.worksheets[0] 
ws.title = "Sheet1" 


def batch_processing(file_name): 
    row_count = sum(1 for row in csv.reader(open(file_name))) 
    try: 
     for row in xrange(1,row_count): 

      data = linecache.getline(file_name, row) 
      print data.strip().split()[1] 
      print data 
      ws.cell("A"+str(row)).value = float(data.strip().split()[0]) 
      ws.cell("B"+str(row)).value = float(data.strip().split()[1]) 

     print file_name 
     wb.save(filename = os.path.splitext(file_name)[0]+".xlsx") 
    except IndexError: 
     pass 


workingdir = "C:\Users\Mine\Desktop\P22_PC" 
os.chdir(workingdir) 
for root, dirnames, filenames in os.walk(workingdir): 
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"): 
     batch_processing(file_name)

任何幫助表示讚賞。謝謝。

來源

2012-11-26 groovyrv

有什麼建議嗎？ – groovyrv

我不確定'linecache'是如何工作的 - 它會使用'sum'中以前打開和關閉的file_name的數據嗎？或者它打開文件只是一次或每行都必須重新打開文件？ – Aprillion

我認爲您的主要問題是您正在寫入Excel並保存在文件的每一行中，以查找目錄中的每個文件。我不確定實際將值寫入Excel需要多長時間，但只要將save從循環中移出，並且只需保存一次即可節省一點時間。另外，這些文件有多大？如果它們很大，那麼linecache可能是一個好主意，但假設它們不是太大，那麼你可能沒有它。

def batch_processing(file_name): 

    # Using 'with' is a better way to open files - it ensures they are 
    # properly closed, etc. when you leave the code block 
    with open(filename, 'rb') as f: 
     reader = csv.reader(f) 
     # row_count = sum(1 for row in csv.reader(open(file_name))) 
     # ^^^You actually don't need to do this at all (though it is clever :) 
     # You are using it now to govern the loop, but the more Pythonic way is 
     # to do it as follows 
     for line_no, line in enumerate(reader): 
      # Split the line and create two variables that will hold val1 and val2 
      val1, val2 = line 
      print val1, val2 # You can also remove this - printing takes time too 
      ws.cell("A"+str(line_no+1)).value = float(val1) 
      ws.cell("B"+str(line_no+1)).value = float(val2) 

    # Doing this here will save the file after you process an entire file. 
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed 
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")

來源

2012-11-29 01:50:38 RocketDonkey

在您回覆之前，我更多地更新了代碼，並且我完全按照您的說法進行了操作。在walk語句後移動了save。我猜這裏的根本問題是哪些進程更快：linecache或使用csv.reader。我會嘗試csv.reader並回復你。再次感謝。 – groovyrv

@groovyrv沒有問題。我沒有真正使用'linecache'，但是從我所知道的來看，主要的好處來自於多次訪問文件（因此，緩存片 - 再一次，我可能完全錯誤）。該模塊的源代碼是http://hg.python.org/cpython/file/2.7/Lib/linecache.py，有一點很有趣的是，在第127行中，您可以看到爲了獲取信息，該文件實際上是打開的，使用'readlines（）'。我相信這會支持多重訪問點 - 當您需要重複訪問來自同一（緩存）文件的行時，您會開始看到好處。 – RocketDonkey

優化或加速從.xy文件讀取excel

回答

相關問題