2011-07-27 29 views
0

我在排序和提取多行文本時遇到了一些麻煩。這裏是我的代碼:如何使用Perl在此處提取多行代碼?

my $searched = $doc->content; 
    if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){ 
     print $1,"\n"; 
     $Modified = $1; 

    } 
    if($searched =~ m/COMPILED in Task $_[1] : (.*?) The/ms){ 
     $Compiled = $1; 

    } 
    if($searched =~ m/DELETED in Task $_[1] : (.*?) Comments/ms){ 
     $Deleted = $1; 

    } 

這裏的是文本文件的例子:

The following are the MODIFIED files in Task 50104 : 

**Directory    Filename    Version 
---------    --------    ------- 
Something    Something    ..... 
......     ......     ..... 
.......     ........     .....** 

The following are the files to be COMPILED in Task 50104 : 

**Directory    Filename 
---------    -------- 
.........    .........** 


The following are the files to be DELETED in Task 50104 : 

**Directory    Filename 
---------    --------** 

Comments: 
Blah blah....... 

凡之間的文本**是我想提取的東西。很抱歉的窮人格式化

+0

上面是空行嗎?以下是......保證嗎? – Zaid

回答

1

我不知道你的文本包含約:和前/評論空間(其實,在我看來那:其次是換行,The被換行之前,沒有空間) ;而不是使用:

if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){ 

嘗試使用:

if($searched =~ /MODIFIED files in Task $_[1] :(.*?)The/gs){ 

我也不認爲你需要的/ g或/ m開關...

如果這不起作用,我會建議您逐步完善您的正則表達式,即首先確保/MODIFIED files in Task $_[1] ::匹配,然後添加其餘的。

+0

OMG,天才!,它的工作表示感謝 – Shahab

0

這是一個快速入侵(未經測試)。而不是整個文件讀入一個字符串,用它行由行模式:

$ script.pl inputfile.txt

my %data; 
my $header; 
while (<>) { 
    next if /^\s*$/; # skip empty lines 
    if (/^The following are /) { # header line 
     if (/(MODIFIED|COMPILED|DELETED)/) { 
      $header = $1; 
     } else { die "Bad header: $_" } 
    } else { # data line 
     die "Header expected" unless (defined $header); 
     $data{$header} .= $_; 
    } 
} 
+0

異常相似的方法,呃? – Zaid

+0

偉大的思想思考。 – TLP

1

Flip-flop operator來救援!

觸發器操作符有左右兩側。一旦左側評估爲真,觸發器保持真實,直到右側評估爲真。

use strict; 
use warnings; 

my $searched = $doc->content; 

my %info; #< Store in a hash > 

open my $string, '<', \$searched or die $!; 

{ 
    my ($type, $content); 

    while (<$string>) { # Process $searched line-by-line 

     if (/(MODIFIED|COMPILED|DELETED)/) { 

      $type = $1; 
     } 

     $content .= $_, next if /^Directory/ .. /^\s*$/ ; 

     $content =~ s{\s+$}{}; # Don't need that trailing whitespace 

     if (defined $type && defined $content) { 

      $info{$type} = $content; # Or push @{ $info{$type} }, $content; 
      undef $type; 
      undef $content; 
     } 
    } 
} 
+0

這有點舊,但我是本文的粉絲,作爲了解觸發器('..')操作符的一種方式:http://www.perl.com/pub/2004/06/18 /variables.html – Telemachus

+0

'將字符串壓入'@ content',然後作爲列表或連接進行處理比'。='更便宜。 – mrk