分離信基於

-1

位置

我有一個.fa文件與字母順序類似ACGGGGTTTTGGGCCCGGGGG和.txt文件與顯示的啓動和停止，如啓動2個停止位7.我怎麼能只從特定位置提取的字母數字我的.fa文件並創建一個新文件，其中只包含來自指定位置的字母？我寫了這樣的代碼，但我得到了錯誤「字符串索引超出範圍」'我的立場txtx文件只是一個像[[1,52]，[66,88] .....的位置點亮）分離信基於

my_file = open('dna.fa') 
transcript = my_file.read() 
positions = open('exons.txt') 
positions = positions.read() 
coding_sequence = '' # declare the variable 

for i in xrange(len(positions)): 
    start = positions[i][0] 
    stop = positions[i][1] 
    exon = transcript[start:stop] 
    coding_sequence = coding_sequence + exon 
print coding_sequence `

來源

2016-02-27 Sergey Bombin

假設你的位置存儲在一個名爲positions列表，您INFILE的名字是infile.fa，和你OUTFILE的名稱是outfile.fa：

with open("infile.fa") as infile: 
    text = infile.read() 
    letters = "".join(text[i] for i in positions) 
    with open("outfile.fa", "w") as outfile: 
     outfile.write(letters)

正如在@ KIDJourney的評論從未被提及，這可能。理論上失敗的足夠大的文件，有沒有足夠的內存來存儲它這裏是你如何能做到這一點，如果是這樣的話：

with open("infile.fa") as infile: 
    with open("outfile.fa", "a") as outfile: 
     outfile.seek(0) 
     i = 0 
     for line in infile: 
      for char in line: 
       if i in positions: 
        outfile.write(char) 
       i += 1

來源

2016-02-28 00:15:51 zondo

如果文件太大以至於RAM無法存儲它，該怎麼辦？ – KIDJourney

我添加了第二個解決方案。 – zondo

如果您嘗試使用非常大文件執行此項工作，則@zondo的解決方案可能因內存不足而失敗。

您可以使用seek當你試圖讀取文件的一部分。

def readData(filename , start_pos , end_pos): 
    with open(filename) as f : 
     f.seek(start_pos) 
     data = f.read(end_pos - start_pos) 
     return data

來源

2016-02-28 00:32:52 KIDJourney

回答

相關問題