處理來自幾個文本文件的數據

-1

關於如何從多個文本文件中抓取數據並處理它們的任何建議（例如計算總計）。我一直在嘗試在Python中做它，但不斷遇到死衚衕。處理來自幾個文本文件的數據

每次您執行操作時，計算機都會以文本格式生成彙總文件，例如，從批處理篩選好蘋果。首先你加載蘋果，然後把好壞分開，然後你可以重新加載不好的蘋果重新測試它們，一些被恢復。所以每個批次至少生成2個摘要文件，具體取決於您加載蘋果恢復的次數。

這是文本文件的例子：

文件1：

general Info: 
    Batch No.  : A2J3 
    Operation  : Test 
    Fruit   : Apple 
    Operation Number : A5500 
    Quantity In  : 10 
yield info: 
    S1 S2  Total Bin Name 
    5 2  7  good 
    1 2  3  bad

文件2：

general Info: 
    Batch No.  : A2J3 
    Operation  : Test 
    Fruit   : Apple 
    Operation Number : A5500 
    Quantity In  : 3 
yield info: 
    S1 S2  Total Bin Name 
    1 1  2  good 
    0 0  1  bad

我希望得到一個文件夾中的盡是些txt文件的數據和將測試結果與以下標準合併：

個
處理同一批通過識別txt文件是從同一批號來，相同的操作（基於txt文件的內容不文件名）

數據合併2（或多個概要文件）轉換成下面的格式CSV：

Lot: 
Operation: 
Bin  First Pass Second Pass Final Yield %Yield 
Good  7   2   9   90% 
Bad  3   1   1   10%

S1，S2是可變的，它可以從1到14，但從來沒有低於1 的垃圾箱也可以有不同的文本文件幾種類型（不僅限於良好和壞。但總會有隻有1好倉）

Bins: 
Good 
Semi-bad 
Bad 
Worst 
...

我是新來的Python和我只用在學校這個腳本語言，我只知道非常基礎，僅此而已。所以這個任務，我想要做的是有點勢不可擋給我，讓我開始處理一個文本文件，並得到我想要的數據，例如：批號

with open('R0.txt') as fh_d10SunFile: 
    fh_d10SumFile_perline = fh_d10SunFile.read().splitlines() 
    #print fh_d10SumFile_perline 

TestProgramName_str = fh_d10SumFile_perline[CONST.TestProgram_field].split(':')[1] 
LotNumber_str  = fh_d10SumFile_perline[CONST.LotNumber_field].split(':')[1] 
QtyIn_int   = int(fh_d10SumFile_perline[CONST.UnitsIn_field].split(':')[1]) 
TestIteration_str = fh_d10SumFile_perline[CONST.TestIteration_field].split(':')[1] 
TestType_str  = fh_d10SumFile_perline[CONST.TestType_field].split(':')[1]

然後抓住所有的垃圾箱在總結文件：

SoftBins_str = filter(lambda x: re.search(r'bin',x),fh_d10SumFile_perline) 
for index in range(len(SoftBins_str)): 
    SoftBins_data_str = [l.strip() for l in SoftBins_str[index].split(' ') if l.strip()] 
    SoftBins_data_str.reverse() 
    bin2bin[SoftBins_data_str[0]] = SoftBins_data_str[2]

後來，我被卡住，因爲我不知道如何做到這一點的閱讀和使用的含網站（S1，S2）的n個文本文件數n個解析。如何從n個文本文件中獲取這些信息，並在內存中處理這些信息（甚至可能使用python），然後在csv輸出文件中輸出計算結果。

來源

2016-10-17 Syd Pao

你說**至少有兩個彙總文件**，但輸出格式固定爲兩遍？我也假設'批次：'是當前的'批號'和'操作：'是'測試'？ –

@MartinEvans是的，這是正確的 –

@SreejithMenon用更多的細節和我試圖做的方法更新了問題。 –

以下內容可幫助您入門。由於您的文本文件是固定格式，讀取它們並解析它們相對簡單。此腳本搜索當前文件夾中的所有文本文件，讀取每個文件並將批次存儲在基於批名稱的字典中，以便將同一名稱區域的所有批次分組在一起。

處理所有文件後，它會爲每個批次創建彙總並將其寫入單個csv輸出文件。

from collections import defaultdict 
import glob 
import csv 

batches = defaultdict(list) 

for text_file in glob.glob('*.txt'): 
    with open(text_file) as f_input: 
     rows = [row.strip() for row in f_input] 

    header = [rows[x].split(':')[1].strip() for x in range(1, 6)] 
    bins = {} 

    for yield_info in rows[8:]: 
     s1, s2, total, bin_name = yield_info.split() 
     bins[bin_name] = [int(s1), int(s2), int(total)] 

    batches[header[0]].append(header + [bins]) 


with open('output.csv', 'wb') as f_output: 
    csv_output = csv.writer(f_output, delimiter='\t') 

    for batch, passes in batches.items(): 
     bins_output = defaultdict(lambda: [[], 0]) 
     total_yield = 0 

     for lot, operation, fruit, op_num, quantity, bins in passes: 
      for bin_name, (s1, s2, total) in bins.iteritems(): 
       bins_output[bin_name][0].append(total) 
       bins_output[bin_name][1] += total 
       total_yield += total 

     csv_output.writerows([['Lot:', lot], ['Operation:', operation]]) 
     csv_header = ["Bin"] + ['Pass {}'.format(x) for x in range(1, 1 + len(passes))] + ["Final Yield", "%Yield"]   
     csv_output.writerow(csv_header) 

     for bin_name in sorted(bins_output.keys()): 
      entries, total = bins_output[bin_name] 
      percentage_yield = '{:.1f}%'.format((100.0 * total)/total_yield) 
      csv_output.writerow([bin_name] + entries + [total, percentage_yield]) 

     csv_output.writerow([])  # empty row to separate batches

給你一個製表符分隔csv文件，如下所示：

Lot: A2J3 
Operation: Test 
Bin Pass 1 Pass 2 Final Yield %Yield 
Bad 3 1 4 30.8% 
Good 7 2 9 69.2%

注意，腳本已被更新，以解決任何數量的倉位類型的。

來源

2016-10-17 09:36:35

處理來自幾個文本文件的數據

回答

相關問題