2013-03-30 20 views
0

我有2個輸入文件,input.txtdatainput.txt。我檢查input.txt的第二列是否與datainput.txt的第一列匹配,如果它們匹配,那麼我把它的orthodb_id放在輸出文件的結尾相關行。如何使用python增強我的數據集輸出

input.txt中:

5 21 218 
6 11 1931 
7 26 173 

datainput.txt:

>21|95|28|5 
Computer 
>11|28|5|5 
Cate 

code.py:

import csv 

with open('input.txt', 'rb') as file1: 
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip()) 

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile: 
    output = csv.writer(outputfile, delimiter='|') 
    for line in file2: 
     if line[:1] == '>': 
      row = line.strip().split('|') 
      key = row[0][1:] 
      if key in file1_data: 
       output.writerow(row + [file1_data[key]]) 

這是輸出我用我的代碼獲得:

>21|95|28|5|5 
>11|28|5|5|6 
+0

如果使用BioPython讀取FASTA格式輸入(DataInput中的文件),你會好起來的。 或看[fastareader](http://stackoverflow.com/questions/7654971/parsing-a-fasta-file-using-a-generator-python)非常天真的例子! –

回答

1

你只需要在代碼中添加一個else塊,以獲得所需的輸出:

import csv 

with open('input.txt', 'rb') as file1: 
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip()) 

with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile: 
    output = csv.writer(outputfile, delimiter='|') 
    for line in file2: 
     if line[:1] == '>': 
      row = line.strip().split('|') 
      key = row[0][1:] 
      if key in file1_data: 
       output.writerow(row + [file1_data[key]]) 
     else: 
      outputfile.write(line) 
+2

一旦您對此解決方案感到滿意,請將其提交至http://codereview.stackexchange.com,以獲取有關如何改進該方法的一些有用建議。 – Tshepang