2014-12-13 36 views
1

我正在使用BioPython MuscleCommanLine在子進程中對齊序列。肌肉的輸入和輸出是stdin和stdout。這工作,但只要popen稱肌肉,我從屏幕上的肌肉得到一個程序總結。這大大減緩了程序的速度,因爲有數百萬次對子進程的調用。biopython MuscleCommandLine

mcline = MuscleCommandline() 
read_list = (SeqRecord(Seq(seq, IUPAC.unambiguous_dna), str(index)) for index, seq in enumerate(grouped_reads_list)) 

muscle = Popen(str(mcline), stdin=PIPE, stdout=PIPE, universal_newlines=True) 

SeqIO.write(read_list, muscle.stdin, "fasta") # Send sequences to Muscle in FASTA format. 
muscle.stdin.close() 

align = AlignIO.read(muscle.stdout, "fasta") # Capture output from muscle and get it into FASTA format in an object. 
print(align) 
muscle.stdout.close() 
exit("Testin Testing") 

consensus_read = AlignInfo.SummaryInfo(align).dumb_consensus(threshold=0.6, ambiguous="N", consensus_alpha=IUPAC.ambiguous_dna) # Create consensus from alignment object. 

屏幕輸出是由Robert C.埃德加

http://www.drive5.com/muscle 該軟件

MUSCLE v3.8.31捐贈給公共領域。 請引用:Edgar,R.C. Nucleic Acids Res 32(5),1792-97。

  • 2個seqs,最大長度爲133,平均長度133 00:00:00 10 MB(-1%)Iter項目1 100.00%K-mer的DIST通1 00:00:00 10 MB(-1 %)Iter 1 100.00%K-mer dist pass 2 00:00:00 12 MB(-1%)Iter 1 100.00%Align node
    00:00:00 12 MB(-1%)Iter 1 100.00%Root對準
  • 6個seqs,最大長度爲133,平均長度133 SingleLetterAlphabet()2行133列
  • 對準

回答

2

我列出此作爲一個答案我不要編輯我的問題,因爲有人可能會覺得它有用。如果我犯了一個錯誤,請讓我知道。問題似乎是以這種方式使用BioPython MuscleCommandLine包裝器。在通過子進程調用時,我無法通過包裝器傳遞任何命令行選項。我的修改代碼如下。

cmd = ['muscle', "-quiet", "-maxiters", "1", "-diags"] 

read_list = (SeqRecord(Seq(seq, IUPAC.unambiguous_dna), str(index)) for index, seq in enumerate(grouped_reads_list)) 

muscle = Popen(cmd, stdin=PIPE, stdout=PIPE, universal_newlines=True) 

SeqIO.write(read_list, muscle.stdin, "fasta") # Send sequences to Muscle in FASTA format. 
muscle.stdin.close() 

align = AlignIO.read(muscle.stdout, 'fasta') # Capture output from muscle and get it into FASTA format in an object. 

muscle.stdout.close() 

consensus_read = AlignInfo.SummaryInfo(align).dumb_consensus(threshold=0.6, ambiguous="N", consensus_alpha=IUPAC.ambiguous_dna) 
return str(consensus_read) 
0

我猜可能直接使用肌肉是一個更好的選擇,如果有任何意外的時候通過BioPython對齊序列。無論如何,做MSA應該很容易。但使用Biopython可能會有點麻煩。