1
我必須搜索圖案wTTTAYRTTTW
,其中W
= A
或T
,Y
= C
或T
,R
= A
或R
,在基因組序列的一個FASTA file。應該存在一些不匹配的情況,即完全匹配字符串及其位置。我的做法是:如何獲得FASTA精確匹配的正確計數?
#!user/bin/perl -w
use strict;
my @name = qw(NC_004314.2_10);
for (my $i = 0 ; $i< scalar(@name);$i++)
{
my $fname = $name[$i];
print "$fname\n";
my $read_pat= "WTTTAYRTTTW";
#print"\nPlease enter how many mismatch is allowd : ";
my $m =<STDIN>;
chomp $m;
unless(open(fh1, "$fname")){
print "Cannot open file \"$fname\"\n\n";
exit;
}
my @fh=<fh1>;
close fh1;
if ($fh[0] !~ /^>/)
{
print "not fasta file\n";
exit;
}
my $seq='';
foreach my $line(@fh)
{
if($line =~ /^>/)
{
next;
}
else
{
$seq=$seq.$line;
}
}
sub trans_pat
{
my $pat=shift;
$pat=~s/R/\[CG\]/g;
$pat=~s/W/\[AT\]/g;
$pat=~s/Y/\[AG\]/g;
return $pat;
}
open(FH1,">$fname.csv");
sub find_pat
{
my ($pat,$seq) = (@_);
#print FH1 "Looking for pattern $pat\n";
}
find_pat (trans_pat($read_pat),$seq);
# Allowing for a single mismatch
my $pat=trans_pat($read_pat);
print FH1 "Looking for pattern $pat\n";
while ($seq=~m/(?=$pat)/g)
{
print FH1"match at\t$-[0]\t$&\n"
}
foreach my $i (1..(length $read_pat)-($m-1))
{
my $mis_pat = $read_pat;
substr($mis_pat,$i-1,$m)=".{$m}";
my $pat1=trans_pat($mis_pat);
print FH1 "Looking for pattern $pat1\n";
while ($seq=~m/(=?$pat1)/g)
{
print FH1 "match at\t$-[0]\t$&\n";
}
#print FH1"$& \n";
}
close FH1;
結果這段代碼發現是錯誤的在完全匹配的FASTA文件中給定的順序NC_004314.2,總比賽數應該是829任何一個可以更正此代碼?