2012-12-10 33 views
0

我正在使用第一步轉換矩陣來生成DNA序列。 現在我需要給出轉換矩陣每1000步改變的概率。我們假設,每1000步,轉換矩陣將有40%的概率發生變化。 更改後每行應加1。 現在我不知道如何訪問python中的嵌套字典數據的值,以及如何實現40%的概率變化。 我重視我的代碼在這裏,任何建議表示讚賞〜如何用馬爾可夫鏈隨機改變轉換矩陣?

#!/usr/bin/env python 

import sys, random 


length = 10000 

tran_matrix = {'a': {'a':0.495,'c':0.113,'g':0.129,'t':0.263}, 
       'c': {'a':0.129,'c':0.063,'g':0.413,'t':0.395}, 
       't': {'a':0.213,'c':0.495,'g':0.263,'t':0.029}, 
       'g': {'a':0.263,'c':0.129,'g':0.295,'t':0.313}} 

initial_p = {'a':0.25,'c':0.25,'t':0.25,'g':0.25}    

def choose(dist): 
    r = random.random() 
    sum = 0.0 
    keys = dist.keys() 
    for k in keys: 
     sum += dist[k] 
     if sum > r: 
     return k 
    return keys[-1] 
c = choose(initial_p) 
for i in range(length): 
    sys.stdout.write(c) 
    c = choose(tran_matrix[c]) 

回答

0

編輯:添加的快速實現,其產生的新的過渡頻率一些代碼。 您可能需要四處尋找,找出哪種隨機數生成器最適合您的情況,並查看是否可以使用對隨機概率的一些限制來獲得更合理的生成。

import sys, random 


LENGTH = 10000 
CHANGE_EVERY = 1000 
CHANGE_PROB = 0.4 

tran_matrix = {'a': {'a':0.495,'c':0.113,'g':0.129,'t':0.263}, 
       'c': {'a':0.129,'c':0.063,'g':0.413,'t':0.395}, 
       't': {'a':0.213,'c':0.495,'g':0.263,'t':0.029}, 
       'g': {'a':0.263,'c':0.129,'g':0.295,'t':0.313}} 

initial_p = {'a':0.25,'c':0.25,'t':0.25,'g':0.25}    


def choose(dist): 
    r = random.random() 
    sum = 0.0 
    keys = dist.keys() 
    for k in keys: 
     sum += dist[k] 
     if sum > r: 
      return k 
    return keys[-1] 


def new_probs(precision=2): 
    """ 
    Generate a dictionary of random transition frequencies, of the form 
    {'a':0.495,'c':0.113,'g':0.129,'t':0.263} 
    """ 
    probs = [] 
    total_prob = 0 
    # Choose a random probability p1 from a uniform distribution in 
    # the range (0, 1), then choose p2 in the range (0, 1 - p1), etc. 
    for i in range(3): 
     up_to = 1 - total_prob 
     p = round(random.uniform(0, up_to), precision) 
     probs.append(p) 
     total_prob += p 
    # Final probability is 1 - (sum of first 3 probabilities) 
    probs.append(1 - total_prob) 
    # Assign randomly to bases 
    # If you don't shuffle the order of the bases each time, 't' 
    # would end up with consistently lower probabilities 
    bases = ['a', 'c', 'g', 't'] 
    random.shuffle(bases) 
    new_prob_dict = {} 
    for base, prob in zip(bases, probs): 
     new_prob_dict[base] = prob 
    return new_prob_dict 

c = choose(initial_p) 
for i in range(LENGTH): 
    if i % CHANGE_EVERY == 0: 
     dice_roll = random.random() 
     if dice_roll < CHANGE_PROB: 
      for base in tran_matrix: 
       # Generate a new probability dictionary for each 
       # base in the transition matrix 
       tran_matrix[base] = new_probs() 
    sys.stdout.write(c) 
    c = choose(tran_matrix[c]) 
+0

嗨馬呂斯,感謝您的幫助,是的,這應該爲我的作品中,change_matrix()函數應該隨機產生4個浮點數加起來爲1。例如,「A」:{「A」:0.495, 'c':0.113,'g':0.129,'t':0.263},這四個數字加1,我只需要4個新的隨機數來代替前4個數字。 – Frank

+0

@Frank:看看我添加的'new_probs()'函數 - 你可能不得不嘗試不同的方法來生成隨機數,因爲我的實現可以給出相同基的長字符串。 – Marius

+0

完美的作品,非常感謝你Marius,你太棒了,當我得到足夠的聲望時,我會點擊有用的按鈕。祝你今天愉快!!!! – Frank