按容器分割字符串Python

所以我正在嘗試編寫一個可用於分析DNA的程序，現在我正試圖在「分支」中分割基因。爲了發生這種情況，我需要分析鏈並用三個終止密碼子之一（三個鹼基對的組）分開它。我的代碼現在是這樣的：按容器分割字符串Python

class Strand: 

    def __init__(self, code): 
    self.code = [code] 
    self.endCodons = [] 
    self.genes = [] 

    def getGenes(self): 
    for codon in self.endCodons: 
     for code in self.code: 
     code = code.split(codon) 


strand = Strand("ATCATGCACATAGAAACTGATACACACCACAGTGATCACATGAAGTACACATG") 
strand.getGenes() 
print(strand.genes)

但是，當我運行它時，它返回一個空列表。我可以使用一些建議。

來源

2015-06-06 mcchucklezz

你期望你的程序做什麼？ 'self.endCodons'是一個空列表，所以'getGenes（）'不會做任何事情。 – MattDMo

什麼是* STOP密碼子*？你需要把它們放在你的問題中！ – Kasramvd

https://www.google.com/#q=stop+codons – MattDMo

通過每個終止密碼子運行一個循環，並由此分裂將導致不正確的輸出，因爲我認爲這些終止密碼子可以以序列中的任何順序出現，並且對終止密碼子列表的迭代將要求停止位於相同的順序。

所以，如果我理解正確的話，您將需要掃描您的字符串由左到右，並搜索密碼子的方法：

class Strand: 
    def __init__(self, code): 
    self.code = code 
    self.endCodons = ["TAG", "TAA", "TGA"] 
    self.genes = [] 

    def getGenes(self): 
    if (len(self.code) % 3 != 0): 
     print("Input sequence is not divisible by 3?") 

    # In this, we assume each stop codon is always 3 characters. 
    iteration = 0 
    lastGeneEnd = 0 
    while (iteration < len(self.code)): 
     # What is our current 3 character sequence? (Unless it's at the end) 
     currentSequence = self.code[iteration:iteration + 3] 

     # Check if our current 3 character sequence is an end codon 
     if (currentSequence in self.endCodons): 
     # What will our gene length be? 
     geneLength = (iteration + 3) - lastGeneEnd 

     # Make sure we only break into multiples of 3 
     overlap = 3 - (geneLength % 3) 
     # There is no overlap if our length is already a multiple of 3 
     if (overlap == 3): overlap = 0 

     # Modify the gene length to reflect our overlap into a multiple of 3 
     geneLength = geneLength + overlap 

     # Update the iteration so we don't process any more than we need 
     iteration = iteration + overlap + 3 

     # Grab the entire gene sequence, including the stop codon 
     gene = self.code[lastGeneEnd:iteration] 

     # If we have a 3-length gene and there's nothing left, just append to the last gene retrieved as it has 
     # got to be part of the last sequence 
     if (len(gene) == 3 and iteration >= len(self.code)): 
      lastIndex = len(self.genes) - 1 
      self.genes[lastIndex] = self.genes[lastIndex] + gene 
      break 

     # Make sure we update the last end index so we don't include portions of previous positives 
     lastGeneEnd = iteration 

     # Append the result to our genes and continue 
     self.genes.append(gene) 

     continue 

     iteration = iteration + 1 

strand = Strand("ATCATGCACATAGAAACTGATACACACCACAGTGATCACATGAAGTACACATG") 
strand.getGenes() 
print("Got Genes: ") 
print(strand.genes) 

for gene in strand.genes: 
    print("Sequence '%s' is a multiple of 3: %u" % (gene, len(gene) % 3 == 0))

我不是一個真正的生物學家，所以我可能已經取得一些不正確的假設。

編輯：

的代碼，保證休息變成三的倍數，但我似乎還是不太明白所需的邏輯。它在給定的例子中工作，但我不確定它是否像其他情況下那樣工作。

來源

2015-06-06 23:16:28 Ragora

你的代碼工作的很好，只是它不會將基因分成3個組。例如：「AGTAGATAA」應該作爲一個基因出現，但它出現爲：「AGTAG ATAA」。它應該只將基因分解成3的倍數。 – mcchucklezz

啊，好的，我會糾正它（因爲我誤解了它應該如何分裂）。 – Ragora

它分裂成三倍的倍數，但我不知道它是否正是你所需要的任何方式。它也假定輸入長度總是可以被3整除（因此如果檢查任何懸掛的3字符序列），這是一個正確的假設嗎？它還假定你不會遇到一個終止密碼子作爲字符串中的第一個序列。（序列從左到右流動，所以這不應該發生？）我認爲現在一個更好的例子是適當分割更長的序列，比如原始文章中的內容。 – Ragora

按容器分割字符串Python

回答

相關問題