Python：如何計算三核苷酸的頻率

所以我的翻譯工作得很好，但是當我運行斷言檢查時，它沒有通過它的錯誤說：它應該是字符串而不是元組。我遇到了問題，但我不知道如何解決它。Python：如何計算三核苷酸的頻率

AssertionError: 

    <class 'tuple'> != <class 'str'>

def frequency(dna_sequence): 
    ''' 
    takes a DNA sequence (in string format) as input, parses it into codons using parse_sequence(), 
    counts each type of codon and returns the codons' frequency as a dictionary of counts; 
    the keys of the dictionary must be in string format 
    ''' 
    codon_freq = dict() 

    # split string with parse_sequence() 
    parsed = parse_sequence(dna_sequence) # it's a function made previously, which actually makes a sequence of string to one-element tuple. 

    # count each type of codons in DNA sequence 
    from collections import Counter 
    codon_freq = Counter(parsed) 

    return codon_freq 

codon_freq1 = codon_usage(dna_sequence1) 
print("Sequence 1 Codon Frequency:\n{0}".format(codon_freq1)) 

codon_freq2 = codon_usage(dna_sequence2) 
print("\nSequence 2 Codon Frequency:\n{0}".format(codon_freq2))

斷言檢查

assert_equal(codon_usage('ATATTAAAGAATAATTTTATAAAAATATGT'), 
      {'AAA': 1, 'AAG': 1, 'AAT': 2, 'ATA': 3, 'TGT': 1, 'TTA': 1, 'TTT': 1}) 
assert_equal(type((list(codon_frequency1.keys()))[0]), str)

關於parse_sequence：

def parse_sequence(dna_sequence): 
    codons = [] 

    if len(dna_sequence) % 3 == 0: 
     for i in range(0,len(dna_sequence),3): 
      codons.append((dna_sequence[i:i + 3],)) 

    return codons

來源

2017-09-24 Mayjunejuly

樣本數據？你能把這個編輯成[mcve]嗎？ – Stedy

請閱讀[mcve] - 您的問題中根本沒有足夠的信息。也許是一個'parsed'和預期結果的最小例子。 – wwii

我做了一些更改。如果這使情況更好。 – Mayjunejuly

您解析正確，但結果是元組而不是所需的字符串，例如，

>>> s = "ATATTAAAGAATAATTTTATAAAAATATGT" 
>>> parse_sequence(s) 
[('ATA',), 
('TTA',), 
('AAG',), 
('AAT',), 
('AAT',), 
('TTT',), 
('ATA',), 
('AAA',), 
('ATA',), 
('TGT',)]

正是從這個線移除trailing comma：

... 
    codons.append((dna_sequence[i:i + 3],)) 
    ...

FYI，sliding window是可以應用到密碼子匹配的技術。下面是使用more_itertools.windowed（第三方工具）一個完整的，簡化的示例：

import collections as ct 

import more_itertools as mit 


def parse_sequence(dna_sequence): 
    """Return a generator of codons.""" 
    return ("".join(codon) for codon in mit.windowed(dna_sequence, 3, step=3)) 

def frequency(dna_sequence): 
    """Return a Counter of codon frequency.""" 
    parsed = parse_sequence(dna_sequence) 
    return ct.Counter(parsed)

測試

s = "ATATTAAAGAATAATTTTATAAAAATATGT" 
expected = {'AAA': 1, 'AAG': 1, 'AAT': 2, 'ATA': 3, 'TGT': 1, 'TTA': 1, 'TTT': 1} 
assert frequency(s) == expected

來源

2017-09-24 04:06:32 pylang

您可能會發現它更容易使用一個Counter直接一個理解。例如

>>> s = 'ATATTAAAGAATAATTTTATAAAAATATGT' 
>>> [s[3*i:3*i+3] for i in xrange(0, len(s)/3)] 
['ATA', 'TTA', 'AAG', 'AAT', 'AAT', 'TTT', 'ATA', 'AAA', 'ATA', 'TGT'] 
>>> from collections import Counter 
>>> Counter([s[3*i:3*i+3] for i in xrange(0, len(s)/3)]) 
Counter({'ATA': 3, 'AAT': 2, 'AAG': 1, 'AAA': 1, 'TGT': 1, 'TTT': 1, 'TTA': 1})

來源

2017-09-24 01:31:46 jq170727

Python：如何計算三核苷酸的頻率

回答

相關問題