在創建元組時迭代字典

我正在學習python，並試圖在地圖和元組上工作。我從一個解析的文件創建了一個字典，並在另一個文件中解析。我想通過詞典進行迭代，並與來自字典獲得的ID替換已解析的文件的每一行的第一個元素在創建元組時迭代字典

我的字典：

for line in blast_lines: 
    (transcript,swissProt,identity) = parse_blast(blast_line=line) 
    transcript_to_protein[transcript] = swissProt

解析該文件中，以及創建一個元組如果該ID

def parse_matrix(matrix_line): 
    matrixFields = matrix_line.rstrip("\n").split("\t") 
    protein = matrixFields[0] 
    if matrixFields[0] in transcript_to_protein: 
      protein = transcript_to_protein.get(transcript) 
      matrixFields[0] = protein 
    return(tuple(matrixFields))

我沒有包括所有的在這裏我的代碼條目存在，因爲我相信我的問題一定是我如何通過迭代，將有從字典作爲第一個元素的值解析文件和字典，但我會包括一切都在底部。

輸入：

爆炸（什麼是存儲在字典）

c1000_g1_i1|m.799 gi|48474761|sp|O94288.1|NOC3_SCHPO 100.00 747 0 0 5 751 1 747 0.0 1506

此行的成績單是c1000_g1_i1，瑞士PROT是O94288.1

矩陣（文件是解析）

c3833_g1_i2 4.00 0.07 16.84 26.37

我想取代第一個字段（matrixFi elds [0]），如果第一個字段中的值與字典中的鍵（transcript）相匹配，則使用swissProt。

我想要的輸出看起來像這樣

Q09748.1 4.00 0.07 16.84 26.37 
O60164.1 24.55 116.87 220.53 28.82 
C5161_G1_I1 107.49 89.39 26.95 698.97 
P36614.1 27.91 72.57 5.56 36.58 
P37818.1 82.57 19.03 48.55 258.22

但正在此：

O94423.1 4.00 0.07 16.84 26.37 
O94423.1 24.55 116.87 220.53 28.82 
C5161_G1_I1 107.49 89.39 26.95 698.97 
O94423.1 27.91 72.57 5.56 36.58 
O94423.1 82.57 19.03 48.55 258.22

注意如何他們全部的4具有相同的價值，而不是單獨的成績單從字典

Full code：

transcript_to_protein = {}; 

def parse_blast(blast_line="NA"): 
    fields = blast_line.rstrip("\n").split("\t") 
    queryIdString = fields[0] 
    subjectIdString = fields[1] 
    identity = fields[2] 
    queryIds = queryIdString.split("|") 
    subjectIds = subjectIdString.split("|") 
    transcript = queryIds[0].upper() 
    swissProt = subjectIds[3] 
    base = swissProt.split(".")[0] 
    return(transcript, swissProt, identity) 

blast_output = open("/scratch/RNASeq/blastp.outfmt6") 
blast_lines = blast_output.readlines() 

for line in blast_lines: 
    (transcript,swissProt,identity) = parse_blast(blast_line=line) 
    transcript_to_protein[transcript] = swissProt 

def parse_matrix(matrix_line): 
    matrixFields = matrix_line.rstrip("\n").split("\t") 
    matrixFields[0] = matrixFields[0].upper() 
    protein = matrixFields[0] 
    if matrixFields[0] in transcript_to_protein: 
      protein = transcript_to_protein.get(transcript) 
      matrixFields[0] = protein 
    return(tuple(matrixFields)) 

def tuple_to_tab_sep(one_tuple): 
    tab = "\t" 
    return tab.join(one_tuple) 

matrix = open("/scratch/RNASeq/diffExpr.P1e-3_C2.matrix") 

newline = "\n" 

list_of_de_tuples = map(parse_matrix,matrix.readlines()) 

list_of_tab_sep_lines = map(tuple_to_tab_sep, list_of_de_tuples) 
print(newline.join(list_of_tab_sep_lines))

來源

2016-12-16 Jamie Leigh

首先在parse_blast()中有一個錯誤 - 它不返回元組(transcript,swissProt,identity)，而是返回(transcript,base,identity)而base不包含缺少的信息。

更新

其次，這裏還有在parse_matrix()的錯誤。從文件中讀取的第一個字段沒有丟失的信息，但是，這是matrixFields[0]位於transcript_to_protein字典中時返回的元組中的內容。

只是修復一個不會自己解決問題。

來源

2016-12-16 18:42:15 martineau

隨着該修正它仍然打印所有更換領域，而不是通過字典迭代相同的值需要，而不是打印過的最後一個值它們全部 –

看來問題可能出現在parseblast函數中。對於線

c1000_g1_i1|m.799 gi|48474761|sp|O94288.1|NOC3_SCHPO 100.00 747 0 0 5 751 1 747 0.0 1506 

subjectIdString = fields[1]

所以subjectIdString將是GI | 48474761 | SP | O94288。1 | NOC3_SCHPO

然後

swissProt = subjectIds[3]

SWISSPROT將O94288.1，其中所述函數進一步拆分，使用。在線

base = swissProt.split(".")[0]

最終的結果將是，SWISSPROT將是094288，而不是| O94288.1，這似乎你期待。我會建議測試單線輸入功能，直到你得到所需的輸出

來源

2016-12-16 18:47:18

它的對單行工作正常，問題在於它只是爲所有行打印相同的swissprot id，而不是與字典中的鍵匹配 –

錯誤是在我的字典調用，因爲我想匹配matrixFields [0]與從字典中的腳本，我試圖搜索字典使用if matrixFields[0] in transcript_to_protein:而是分配領域

trasncript = matrixfields[0] 
if transcript in transcript_to_protein: 
     protein = transcript_to_protein.get(transcript)

來源

2016-12-16 19:07:38

在創建元組時迭代字典

回答

相關問題