如何在科學記數法中以十進制形式記錄數字，並在perl regex中記錄？

我是新來的Perl正則表達式，所以我感謝任何幫助。我解析BLAST輸出。現在，我只能說明e值只包含整數和小數的匹配。如何在科學記數法中包含e值？如何在科學記數法中以十進制形式記錄數字，並在perl regex中記錄？

blastoutput.txt

               Score  E 
Sequences producing significant alignments:      (Bits) Value 

ref|WP_001577367.1| hypothetical protein [Escherichia coli] >... 75.9 4e-15 
ref|WP_001533923.1| cytotoxic necrotizing factor 1 [Escherich... 75.9 7e-15 
ref|WP_001682680.1| cytotoxic necrotizing factor 1 [Escherich... 75.9 7e-15 
ref|ZP_15044188.1| cytotoxic necrotizing factor 1 domain prot... 40.0 0.002 
ref|YP_650655.1| hypothetical protein YPA_0742 [Yersinia pest... 40.0 0.002 

ALIGNMENTS 
>ref|WP_001577367.1| hypothetical protein [Escherichia coli]

parse.pl

open (FILE, './blastoutput.txt'); 
my $marker = 0; 
my @one; 
my @acc; 
my @desc; 
my @score; 
my @evalue; 
my $counter=0; 
while(<FILE>){ 
    chomp; 
    if($marker==1){ 
    if(/^(\D+)\|(.+?)\|\s(.*?)\s(\d+)(\.\d+)? +(\d+)([\.\d+]?) *$/) { 
    #if(/^(\D+)\|(.+?)\|\s(.*?)\s(\d+)(\.\d+)? +(\d+)((\.\d+)?(e.*?)?) *$/) 
      $one[$counter] = $1; 
      $acc[$counter] = $2; 
      $desc[$counter] = $3; 
      $score[$counter] = $4+$5; 
      if(! $7){ 
       $evalue[$counter] = $6; 
      }else{ 
       $evalue[$counter] = $6+$7; 
      } 
      $counter++; 
     } 
    } 
    if(/Sequences producing significant alignments/){ 
     $marker = 1; 
    }elsif(/ALIGNMENTS/){ 
     $marker = 0; 
    }elsif(/No significant similarity found/){ 
     last; 
    } 
} 
for(my $i=0; $i < scalar(@one); $i++){ 
    print "$one[$i] | $acc[$i] | $desc[$i] | $score[$i] | $evalue[$i]\n"; 
} 
close FILE;

來源

2013-05-06 Steve

可以匹配這在科學記數法若干（或沒有）：

\d+(?:\.\d+)?+(?:e[+-]?\d+)?+

與您的代碼：

if (/^([^|]+)\|([^|]+)\|\s++(.*?)\s(\d+(?:\.\d+)?+)\s+(\d+(?:\.\d+)?+(?:e[+-]?\d+)?+)\s*$/) { 
    $one[$counter] = $1; 
    $acc[$counter] = $2; 
    $desc[$counter] = $3; 
    $score[$counter] = $4; 
    $evalue[$counter] = $5; 
    $counter++; 
}

（我已經添加了一些所有格量化符++和?+以儘可能減少回溯步驟的次數，但第3組使用了惰性量詞。最好的是，如果有可能比你使用更精確的模式來描述部分）

來源

2013-05-06 04:31:19

您也可避免匹配那些數字：

while(<FILE>){ 
    chomp; 
    $marker = 0 if $marker and /ALIGNMENTS/; 
    if($marker == 1 and my ($r, $w, $d) = split(/[|]/)) { 
      my @v = split (/\s+/, $d); 
      print "$v[-2]\t$v[-1]\n"; 
      # some processing ... 
    } 
    $marker = 1 if /Sequences producing significant alignments/; 
    last  if /No significant similarity found/; 
}

來源

2013-05-06 05:29:34 perreal

如果是轉讓或實踐用Perl，然後取一些其他的建議，並試圖找出最好的解決方案（但不要停留在那裏，你還會在互聯網上找到很多東西，甚至有些書籍涵蓋了解析BLAST的主題！）。但實際上，您絕對不想以這種方式解析BLAST報告，因爲您的代碼不會被讀取，並且也不能保證將來能夠正常工作，因爲普通報告格式可能會發生變化。

我強烈建議您堅持使用XML輸出或製表符分隔的表格格式，並使用BioPerl的Bio::SearchIO來解析報告。例如，如果您看一下Bio::SearchIO HOWTO，您可以看到，選擇報表的某些部分並按照某些標準進行過濾非常容易，而無需任何Perl知識。如果您想提出一個非BioPerl解決方案，我建議您考慮使用製表符分隔的格式，以便您將來可以更輕鬆地完成自己的任務（然後您可以使用可管理和可讀的方式實現複雜的任務）。

來源

2013-05-06 19:40:32 SES

如何在科學記數法中以十進制形式記錄數字，並在perl regex中記錄？

回答

相關問題