2012-08-22 50 views
1

我已經創建了一個腳本,該腳本可成功搜索XML格式的Blastx輸出文件中的關鍵字(由用戶指定)。現在,我需要將包含對齊標題中關鍵字的記錄(查詢,命中,分數,evalue等)寫入新文件。從blastx輸出文件中提取特定條目,寫入新文件

我已經爲每個查詢標題,命中標題,e值和對齊長度創建了單獨的列表,但似乎無法將它們寫入新文件。

  • 問題1:如果Python錯誤和其中一個列表缺少一個值會怎麼樣...?然後,所有其他列表將提供有關查詢的錯誤信息(「行滑移」,如果您願意......)。

  • 問題2:即使Python沒有錯誤,並且所有列表長度相同,我如何將它們寫入文件以便每個列表中的第一項相互關聯(並因此,每個列表中的項目#10也是相關的?)我應該創建一個字典嗎?

  • 問題3:字典只有一個鍵值,如果我的查詢有幾個不同的命中?不知道它是否會被覆蓋或跳過,或者只是錯誤。有什麼建議麼?我現在的腳本:

    from Bio.Blast import NCBIWWW 
    from Bio.Blast import NCBIXML 
    import re 
    
    #obtain full path to blast output file (*.xml) 
    outfile = input("Full path to Blast output file (XML format only): ") 
    
    #obtain string to search for 
    search_string = input("String to search for: ") 
    
    #open the output file 
    result_handle = open(outfile) 
    
    #parse the blast record 
    blast_records = NCBIXML.parse(result_handle) 
    
    #initialize lists 
    query_list=[] 
    hit_list=[] 
    expect_list=[] 
    length_list=[] 
    
    #create 'for loop' that loops through each HIGH SCORING PAIR in each ALIGNMENT from each RECORD 
    for record in blast_records: 
         for alignment in record.alignments:  #for description in record.descriptions??? 
           for hsp in alignment.hsps:  #for title in description.title??? 
    
             #search for designated string 
             search = re.search(search_string, alignment.title) 
    
             #if search comes up with nothing, end 
             if search is None: 
               print ("Search string not found.") 
               break 
    
             #if search comes up with something, add it to a list of entries that match search string 
             else: 
    
               #option to include an 'exception' (if it finds keyword then DOES NOT add that entry to list) 
               if search is "trichomonas" or "entamoeba" or "arabidopsis": 
                 print ("found exception.") 
                 break 
               else: 
    
                 query_list.append(record.query) 
                 hit_list.append(alignment.title) 
                 expect_list.append(expect_val) 
                 length_list.append(length) 
    
                 #explicitly convert 'variables' ['int' object or 'float'] to strings 
                 length = str(alignment.length) 
                 expect_val = str(hsp.expect) 
    
                 #print ("\nquery name: " + record.query) 
                 #print ("alignment title: " + alignment.title) 
                 #print ("alignment length: " + length) 
                 #print ("expect value: " + expect_val) 
                 #print ("\n***Alignment***\n") 
                 #print (hsp.query) 
                 #print (hsp.match) 
                 #print (hsp.sbjct + "\n\n") 
    
    
                 if query_len is not hit_len is not expect_len is not length_len: 
                   print ("list lengths don't match!") 
                   break 
                 else: 
    
                   qrylen = len(query_list) 
                   query_len = str(qrylen) 
                   hitlen = len(hit_list) 
                   hit_len = str(hitlen) 
                   expectlen = len(expect_list) 
                   expect_len = str(expectlen) 
                   lengthlen = len(length_list) 
                   length_len = str(lengthlen) 
                   outpath = str(outfile) 
    
                   #create new file 
                   outfile = open("__Blast_Parse_Search.txt", "w") 
                   outfile.write("File contains entries from [" + outpath + "] that contain [" + search_string + "]") 
                   outfile.close 
    
                   #write list to file 
                   i = 0 
                   list_len = int(query_len) 
                   for i in range(0, list_len): 
    
                     #append new file 
                     outfile = open("__Blast_Parse_Search.txt", "a") 
                     outfile.writelines(query_list + hit_list + expect_list + length_list) 
                     i = i + 1 
    
                   #write to disk, close file 
                   outfile.flush() 
                   outfile.close 
    
    print ("query list length " + query_len) 
    print ("hit list length " + hit_len) 
    print ("expect list length " + expect_len) 
    print ("length list length " + length_len + "\n\n") 
    print ("first record: " + query_list[0] + " " + hit_list[0] + " " + expect_list[0] + " " + length_list[0]) 
    print ("last record: " + query_list[-1] + " " + hit_list[-1] + " " + expect_list[-1] + " " + length_list[-1]) 
    print ("\nFinished.\n") 
    

回答

0

如果我正確理解你的問題,你可以使用默認值線打滑東西一樣:

try: 
    x(list) 
except exception: 
    append_default_value(list) 

http://docs.python.org/tutorial/errors.html#handling-exceptions

,或者使用元組字典鍵如(0,1,1),並使用get方法作爲默認值。

http://docs.python.org/py3k/library/stdtypes.html#mapping-types-dict

如果你需要保持你的輸出文件中的數據結構,你可以嘗試使用擱置:

,或者你可以在每條記錄後追加一些類型的引用,並給每個記錄例如一個唯一的ID '#32{somekey:value}#21#22#44#'

再次,您可以使用一個元組有多個鍵。

我不知道有沒有什麼幫助,你可以解釋你有什麼你的部分代碼與麻煩。像x()給我輸出y,但我預計z

相關問題