搜索FASTA文件的主題和返回標題行的每個含有基序的序列

以下是我搜索用戶提供的主題在命令行輸入的FASTA文件的代碼。當我運行它並輸入一個我知道在文件中的圖案時，它會返回'Motif not found'。我只是一個Perl的初學者，我無法理解如何讓它打印出主題，更不用說返回標題了。我希望有任何幫助解決這個問題。搜索FASTA文件的主題和返回標題行的每個含有基序的序列

謝謝。

use warnings; 
use strict; 


my $motif; 
my $filename; 
my @seq; 
#my $motif_found; 
my $scalar; 

$filename = $ARGV[0]; 

open (DNAFILE,$filename) || die "Cannot open file\n"; 
@seq = split(/[>]/, $filename); 
print "Enter a motif to search for; "; 

$motif = <STDIN>; 

chomp $motif; 
foreach $scalar(@seq) { 
    if ($scalar =~ m/$motif/ig) { 
     print "Motif found in following sequences\n"; 
     print $scalar; 
    } else { 
     print "Motif was not found\n"; 
    } 
} 
close DNAFILE;

來源

2010-12-01 Kevin Egan

請不要編寫描述單行代碼的註釋。他們不添加任何東西。 – 2010-12-01 13:50:21

您正在嘗試從文件名讀取，而不是文件句柄。

更換

@seq = split(/[>]/, $filename);

通過

@seq = <DNAFILE>

（或分割它，如果你需要 - 我不知道你分割/ [>] /是應該做的事情：沒有指出在[]中放置單個字符）。

來源

2010-12-01 14:02:49

「滾動你自己的」Fasta分析器沒有意義。 BioPerl花了數年的時間開發一款，使用它會很愚蠢。

use strict; 
use Bio::SeqIO; 

my $usage = "perl dnamotif.pl <fasta file> <motif>"; 
my $fasta_filename = shift(@ARGV) or die("Usage: $usage $!"); 
my $motif = shift(@ARGV) or die("Usage: $usage $!"); 

my $fasta_parser = Bio::SeqIO->new(-file => $fasta_filename, -format => 'Fasta'); 
while(my $seq_obj = $fasta_parser->next_seq()) 
{ 
    printf("Searching sequence '%s'...", $seq_obj->id); 
    if((my $pos = index($seq_obj->seq(), $motif)) != -1) 
    { 
    printf("motif found at position %d!\n", $pos + 1); 
    } 
    else 
    { 
    printf("motif not found.\n"); 
    } 
}

該程序只能找到的每個序列中的第一序匹配的（基於1的）位置。它可以很容易地編輯，以找到每場比賽的位置。它也可能不會以您想要/需要的格式精確打印。我將這些問題留作「爲讀者做的練習」。 :)

如果您需要下載BioPerl，請嘗試this link。如果您有任何問題，請告訴我。

對於這樣的生物信息學問題，我發現BioStar論壇非常有幫助。

來源

2010-12-01 15:26:23

搜索FASTA文件的主題和返回標題行的每個含有基序的序列

回答

相關問題