2016-10-11 84 views
4

我有一個列表的列表:如何通過Python中的for循環傳遞列表列表?

sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']] 
count = [[4,3],[4,2]] 
correctionfactor = [[1.33, 1.5],[1.33,2]] 

我計算每個字符(PI)的頻率,將其平方和然後總和(和然後我計算HET = 1 - 總和)。

The desired output [[1,2],[1,2]] #NOTE: This is NOT the real values of expected output. I just need the real values to be in this format. 

問題:我不怎麼通過列表(樣本,計數)在這個循環中提取所需的值。我以前只通過一個列表(例如['TACT','TTTT'..])使用此代碼。

  • 我懷疑我需要添加環路越大,指數在每個元素的樣品(即指數超過sample[0] = ['TTTT', 'CCCZ']sample[1] = ['ATTA', 'CZZC']。我不知道如何將其寫入代碼。

** 代碼

list_of_hets = [] 
for idx, element in enumerate(sample): 
    count_dict = {} 
    square_dict = {} 
    for base in list(element): 
     if base in count_dict: 
      count_dict[base] += 1 
     else: 
      count_dict[base] = 1 
    for allele in count_dict: #Calculate frequency of every character 
     square_freq = (count_dict[allele]/count[idx])**2 #Square the frequencies 
     square_dict[allele] = square_freq   
    pf = 0.0 
    for i in square_dict: 
     pf += square_dict[i] # pf --> pi^2 + pj^2...pn^2 #Sum the frequencies 
    het = 1-pf      
    list_of_hets.append(het) 
print list_of_hets 

"Failed" OUTPUT: 
line 70, in <module> 
square_freq = (count_dict[allele]/count[idx])**2 
TypeError: unsupported operand type(s) for /: 'int' and 'list'er 
+1

錯誤消息告訴您確切** **什麼是錯的:'square_freq =(count_dict [等位基因] /計數[IDX])** 2'正在引發'TypeError:不支持的操作數類型(s)爲/:'int'和'list'。你不能用'list'來劃分'int'。順便說一下,這與您編寫的代碼不匹配,當您嘗試將計數[idx]傳遞給「float」時,可能會引發另一個「TypeError」。 –

+0

我想使用一個zip命令,如'square_freq = [[n/d for n,d in zip(subq,subr)] for subq,subr in zip(count_dict [allele],counts)]''。但我仍然有錯誤。還有其他建議嗎? – biogeek

+0

@ PM2Ring我已糾正它。感謝您指出 – biogeek

回答

3

我不是你想如何處理你的數據「Z」項目完全清楚,但是這個代碼複製爲樣本數據輸出

from __future__ import division 

bases = set('ACGT') 
#sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']] 
sample = [['ATTA', 'TTGA'], ['TTCA', 'TTTA']] 

list_of_hets = [] 
for element in sample: 
    hets = [] 
    for seq in element: 
     count_dict = {} 
     for base in seq: 
      if base in count_dict: 
       count_dict[base] += 1 
      else: 
       count_dict[base] = 1 
     print count_dict 

     #Calculate frequency of every character 
     count = sum(1 for u in seq if u in bases) 
     pf = sum((base/count) ** 2 for base in count_dict.values()) 
     hets.append(1 - pf) 
    list_of_hets.append(hets) 

print list_of_hets 

輸出

{'A': 2, 'T': 2} 
{'A': 1, 'T': 2, 'G': 1} 
{'A': 1, 'C': 1, 'T': 2} 
{'A': 1, 'T': 3} 
[[0.5, 0.625], [0.625, 0.375]] 

此代碼可以通過使用collections.Counter代替count_dict的進一步簡化。

順便說一句,如果不在'ACGT'中的符號是總是'Z'那麼我們可以加快count的計算。擺脫bases = set('ACGT'),改變

count = sum(1 for u in seq if u in bases) 

count = sum(1 for u in seq if u != 'Z') 
+0

我的最終輸出必須採用'[[0.5,0.625],[0.625,0。375]]',因爲我需要能夠區分set1中的第一個元素(['ATTA','TTGA'])與set2 ['TTCA','TTTA'] – biogeek

+0

另外,不要擔心「Zs 「我已經想出了一種處理它的方法:) – biogeek

+0

@biogeek:這很容易做到。看到我的答案的新版本。 –