2016-12-30 12 views
0

我見過一些以前的文章有解決方案,爲別人工作,但由於某種原因一直沒有爲我工作。我試圖編寫一個python腳本來合併3個具有相同格式的文件,2)刪除重複的頭只,3)排序行Specimen_ID,和4)在每個行之間添加2個新的空行獨特的Specimen_ID(也就是說,除了第一個實例,由於頭部的原因,每三行都需要第4行)。合併文件,但只在標題行輸出

我有一個腳本,前兩個和最後一步工作的一部分:

import glob 

read_files = glob.glob("*.txt") 

header_saved = False 
linecnt=0 
with open("merged_data.txt", "wb") as outfile: 
    for f in read_files: 
     with open(f, "rb") as infile: 
      header = next(infile) 
      if not header_saved: 
       outfile.write(header) 
       header_saved = True 
      for line in infile: 
       outfile.write(line) 
       linecnt=linecnt+1 
       if (linecnt%3)==0: 
        outfile.write("\n\n") 

分揀行有什麼建議?另外,如果數據從製表符分隔的txt文件中導出Excel,我發現該腳本只會導致包含第一個infile的內容的輸出,而不會導致其他輸出。如果我只是將數據複製並粘貼到新的txt文件中並將它們用作infiles,那麼我沒有任何問題。有誰知道我爲什麼遇到這個問題?

例輸入文件的文本(INFILE 1):

Specimen_ID Measured_by_initals Measure_date Sex Beak_length Pronotal_width Right_fore_femur_length Right_fore_femur_width Left_fore_femur_length Left_fore_femur_width Right_hind_femur_length Right_hind_femur_width Left_hind_femur_length Left_hind_femur_width Right_hind_femur_area Left_hind_femur_area Right_hind_tibia_width Left_hind_tibia_width Notes 
a 1 30-Dec-16 M 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
b 1 30-Dec-16 F 4 4 4 4 4 4 4 4 4 4 4 4 4 4 beak bent 
c 1 30-Dec-16 M 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
d 1 30-Dec-16 F 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
e 1 30-Dec-16 F 4 4 4 4 4 4 4 4 4 4 4 4 4 4 pronotum deformed 
f 1 30-Dec-16 F 4 4 4 4 4 4 4 4 4 4 4 4 4 4 

例輸入文件的文本(INFILE 2):

Specimen_ID Measured_by_initals Measure_date Sex Beak_length Pronotal_width Right_fore_femur_length Right_fore_femur_width Left_fore_femur_length Left_fore_femur_width Right_hind_femur_length Right_hind_femur_width Left_hind_femur_length Left_hind_femur_width Right_hind_femur_area Left_hind_femur_area Right_hind_tibia_width Left_hind_tibia_width Notes 
a 2 30-Dec-16 M 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
b 2 30-Dec-16 F 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
c 2 30-Dec-16 M 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
d 2 30-Dec-16 F 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
e 2 30-Dec-16 F 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
f 2 30-Dec-16 F 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
+0

可以請你分享的樣本所有三個文件?您的代碼看起來正確 – Shijo

回答

0

您的解決方案應該是工作完美,除非有一些意想不到的數據文件。我剛添加的代碼爲您的第三個項目

read_files = glob.glob("*.txt") 

header_saved = False 
linecnt=0 
with open("merged_data.txt", "wb") as outfile: 
    for f in read_files: 
     with open(f, "rb") as infile: 
      header = next(infile) 
      if not header_saved: 
       outfile.write(header) 
       header_saved = True 
      for line in infile: 
       outfile.write(line) 
       linecnt=linecnt+1 
       if (linecnt%3)==0: 
        outfile.write("\n\n") 

inputfile1.txt

Employee,Account,Currency,Amount,Location 
Test 1, Basic,USD,3000,Airport 
Test 2, Net, USD,2000,Airport 
Test 3, Basic,USD,4000,Town 
Test 4, Net, USD,3000,Town 
Test 5, Basic,GBP,5000,Town 
Test 6, Net, GBP,4000,Town 

inputfile2.txt

Employee,Account,Currency,Amount,Location 
Test 8, Basic,USD,3000,Airport 
Test 9, Net, USD,2000,Airport 
Test 10, Basic,USD,4000,Town 
Test 11, Net, USD,3000,Town 
Test 12, Basic,GBP,5000,Town 
Test 13, Net, GBP,4000,Town 

輸出

Employee,Account,Currency,Amount,Location 
Test 1, Basic,USD,3000,Airport 
Test 2, Net, USD,2000,Airport 
Test 3, Basic,USD,4000,Town 


Test 4, Net, USD,3000,Town 
Test 5, Basic,GBP,5000,Town 
Test 6, Net, GBP,4000,Town 

Test 8, Basic,USD,3000,Airport 
Test 9, Net, USD,2000,Airport 
Test 10, Basic,USD,4000,Town 


Test 11, Net, USD,3000,Town 
Test 12, Basic,GBP,5000,Town 
Test 13, Net, GBP,4000,Town 
+0

我試過了您的腳本版本,但它導致了我生成的相同錯誤輸出,即只包含第一個infile內容的outfile。另外,在任何行之間不插入兩條新的黑線。 –

+0

我只是用你的示例文件執行代碼,它工作正常。你能檢查你的機器上第二個文件的擴展名嗎?它可能不是* .txt – Shijo

+0

嗯,這很奇怪。這些文件具有.txt擴展名幷包含內容。不知道爲什麼它對你而言是正確的,但不適合我。 –

相關問題