2016-06-28 106 views
0

我想要做的是估計每個肽即行嵌套的循環:列表 - 沒有得到期望的輸出

我的代碼的得分如下:

import csv, math 

def train_data(fname): 
     #load csv training files 
     peptide= [] 
     allele= [] 
     score = [] 
     with open (fname) as train: 
       reader = csv.DictReader(train, delimiter='\t') 
       for row in reader: 
         peptide.append(row['peptide']) 
         allele.append(row['allele']) 
         score.append(row['score']) 

     return [peptide, allele, score] 

def ff(): 
     peptide, allele, score = train_data('sample.txt') 
     p={'A':(0.074+0.077)/2, 'R':(0.052+0.053)/2, 'N':(0.045+0.044)/2, 'D':(0.054+0.051)/2, 'C':(0.025+0.022)/2, 'Q':(0.034+0.035)/2, 'E':(0.054+0.056)/2, 'G':(0.074+0.074)/2, 'H':(0.026+0.025)/2, 'I':(0.068+0.064)/2, 'L':(0.099+0.096)/2, 'K':(0.058+0.058)/2, 'M':(0.025+0.024)/2, 'F':(0.047+0.048)/2, 'P':(0.039+0.041)/2, 'S':(0.057+0.059)/2, 'T':(0.051+0.053)/2, 'W':(0.013+0.014)/2, 'Y':(0.032+0.033)/2, 'V':(0.073+0.072)/2} 
     for i in range(len(peptide)): 
#    peptide[i]=list(peptide[i]) 
       peptide.append(peptide[i]) 
       for j in range(len(peptide[i])): 
         print(peptide[2][j]) 
         #est_score+=p[peptide[i][j]] 
       print ('---') 
     print(peptide[2][1]) 

if __name__=='__main__': 

     ff() 

當我運行這個代碼我所得到的輸出是所有肽值,即肽[i] [j]在循環中的打印語句,但我想要的是隻得到肽[2] [J]值。 也在循環外面,它工作正常。 打印(肽[2] [1])給出了O/P完全正常即值 ''

我csv文件是這樣的:

peptide score allele 
AAAGAEAGKATTEEQ 0.190842 DRB1_0101 
AAAGAEAGKATTEEQ 0.006301 DRB1_0301 
AAAGAEAGKATTEEQ 0.066851 DRB1_0401 
AAAGAEAGKATTEEQ 0.006344 DRB1_0405 
AAAGAEAGKATTEEQ 0.035130 DRB1_0701 
AAAGAEAGKATTEEQ 0.006288 DRB1_0802 
AAAGAEAGKATTEEQ 0.176268 DRB1_0901 
AAAGAEAGKATTEEQ 0.042555 DRB1_1101 
AAAGAEAGKATTEEQ 0.114855 DRB1_1302 
AAAGAEAGKATTEEQ 0.006377 DRB1_1501 
AAAGAEAGKATTEEQ 0.006296 DRB3_0101 
AAAGAEAGKATTEEQ 0.006313 DRB4_0101 
AAAGAEAGKATTEEQ 0.070413 DRB5_0101 

什麼我想要做的是估計每個肽的得分,即排 並非所有行一起使用: est_score + = p [肽[i] [j]]

+0

pepetide [i]是一個字符串。對於範圍內的j(len(肽[i]))將循環j值,但是您打印的是肽[2]中的每個單獨字符,而不是與肽[i]有關的任何字符。 –

+0

您可以告訴我,如果我想分別計算每行的分數,該怎麼辦?它所做的是計算所有行的分數。 –

+1

我不確定我是否理解你的意思,「分別計算每行的分數」。您的文件似乎每行都有一個分數。什麼是計算?你在for範圍內的for循環(len(肽))正在循環遍歷每一行。因此在環肽[1] = AAAGA ...,評分[1] = 0.190842和等位基因[1] = DRB1_0101方面。我不知道你用這些值試圖做什麼 –

回答

1
import csv, math 

p={'A':(0.074+0.077)/2, 'R':(0.052+0.053)/2, 'N':(0.045+0.044)/2, 'D':(0.054+0.051)/2, 'C':(0.025+0.022)/2, 'Q':(0.034+0.035)/2, 'E':(0.054+0.056)/2, 'G':(0.074+0.074)/2, 'H':(0.026+0.025)/2, 'I':(0.068+0.064)/2, 'L':(0.099+0.096)/2, 'K':(0.058+0.058)/2, 'M':(0.025+0.024)/2, 'F':(0.047+0.048)/2, 'P':(0.039+0.041)/2, 'S':(0.057+0.059)/2, 'T':(0.051+0.053)/2, 'W':(0.013+0.014)/2, 'Y':(0.032+0.033)/2, 'V':(0.073+0.072)/2} 

def train_data(fname): 
     #load csv training files 
     peptide= [] 
     allele= [] 
     score = [] 
     with open (fname) as train: 
       reader = csv.DictReader(train, delimiter='\t') 
       for row in reader: 
         peptide.append(row['peptide']) 
         allele.append(row['allele']) 
         score.append(row['score']) 

     return [peptide, allele, score] 

def ff(): 
     peptide, allele, score = train_data('peptide.txt') 
     for i in range(len(peptide)): 
       est_score = 0 
       for char in peptide[i]: 
        est_score += p[char] 
       print("est_score: " + str(est_score), "\t: read_score: " + str(score[i])) 
       print ('---') 
     print(peptide[2][1]) 

if __name__=='__main__': 

     ff() 

est_score總是相同的,因爲在您提供的文件中,肽在每一行中都是相同的。這將打印:

est_score: 0.9625000000000001 : read_score: 0.190842 
--- 
est_score: 0.9625000000000001 : read_score: 0.006301 
--- 
est_score: 0.9625000000000001 : read_score: 0.066851 
--- 
est_score: 0.9625000000000001 : read_score: 0.006344 
--- 
est_score: 0.9625000000000001 : read_score: 0.035130 
--- 
est_score: 0.9625000000000001 : read_score: 0.006288 
--- 
est_score: 0.9625000000000001 : read_score: 0.176268 
--- 
est_score: 0.9625000000000001 : read_score: 0.042555 
--- 
est_score: 0.9625000000000001 : read_score: 0.114855 
--- 
est_score: 0.9625000000000001 : read_score: 0.006377 
--- 
est_score: 0.9625000000000001 : read_score: 0.006296 
--- 
est_score: 0.9625000000000001 : read_score: 0.006313 
--- 
est_score: 0.9625000000000001 : read_score: 0.070413 
--- 
A 
0

對我來說,只是打印peptide[2][j],但它在打印多次,這是你想要的嗎?

A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 
A 
A 
G 
A 
E 
A 
G 
K 
A 
T 
T 
E 
E 
Q 
--- 
A 

python2和python3都給了我相同的結果。

+0

{'T ':0.052,'K':0.058,'G':0.074,'Q':0.0345,'D':0.0525,'E':0.055,'M':0.0245,'C':0.0235,'W': '':0.066,'L':0.0975,'S':0.057999999999999996,'P':0.04,'F':0.0475,'A':0.0755,'V':0.0725,'N':0.0445, 'R':0.0525,'Y':0.0325,'H':0.025500000000000002} 我想使用詞典即上面的值來預測/計算分數... estimated_score - > est_score。對於每個肽即行。但是,代碼的作用是爲所有肽即行添加分數 –