2010-05-17 75 views
7

我有一個這樣的文件:匹配多行Perl的正則表達式

01 00 01 14 c0 00 01 10 01 00 00 16 00 00 00 64 
00 00 00 65 00 00 01 07 40 00 00 22 68 61 6c 2e 
6f 70 65 6e 65 74 2e 63 6f 6d 3b 30 30 30 30 30 
30 30 30 32 3b 30 00 00 00 00 01 08 40 00 00 1e 
68 61 6c 2e 6f 70 65 6e 65 74 2d 74 65 6c 65 63 
6f 6d 2e 6c 61 6e 00 00 00 00 01 28 40 00 00 21 
72 65 61 6c 6d 31 2e 6f 70 65 6e 65 74 2d 74 65 
6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 00 01 25 
40 00 00 1e 68 61 6c 2e 6f 70 65 6e 65 74 2d 74 
65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 00 00 01 1b 
40 00 00 20 72 65 61 6c 6d 2e 6f 70 65 6e 65 74 
2d 74 65 6c 65 63 6f 6d 2e 6c 61 6e 00 00 01 02 
40 00 00 0c 01 00 00 16 00 00 01 a0 40 00 00 0c 
00 00 00 01 00 00 01 9f 40 00 00 0c 00 00 00 00 
00 00 01 16 40 00 00 0c 00 00 00 00 00 00 01 bb 
40 00 00 28 00 00 01 c2 40 00 00 0c 00 00 00 00 
00 00 01 bc 40 00 00 13 31 39 37 37 31 31 31 32 
32 33 31 00 

我讀文件,然後找到某些字節,用標籤替換它們:

while(<FH>){ 
    $line =~ s/(00 00 00 64)/<incr4> /g; 
    $line =~ s/(00 00 00 65)/<incr4> /g; 
    $line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g; 
    $line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g; 
    print OUTPUT $line; 
} 

因此,例如, ,00 00 00 64將被替換爲<incr4>標記。這工作正常,但它似乎無法再匹配多行。例如,模式31 31 32 32 33 31運行多行,正則表達式似乎沒有捕獲它。我嘗試使用/ m/s模式修飾符忽略新行,但它們也不匹配。圍繞它的唯一途徑,我可以想出,是整個文件讀入使用字符串:

undef $/; 
my $whole_file = <FH>; 
my $line = $whole_file; 
$line =~ s/(00 00 00 64)/<incr4> /g; 
$line =~ s/(00 00 00 65)/<incr4> /g; 
$line =~ s/(30 30 30 30 30 32)/<incr6ascii:999999:0>/g; 
$line =~ s/(31 31 32 32 33 31)/<incr6ascii:999999:0>/g; 
print OUTPUT $line; 

這個作品中,標籤被正確插入,但該文件的結構,從根本上改變。這一切都被排除在一條線上。我想保留文件的結構,如下所示。有關我如何做到這一點的任何想法?

/約翰

回答

4

這裏的竅門是在類的所有空間的匹配相同的字符\s

my $file = do {local (@ARGV, $/) = 'filename.txt'; <>}; # slurp file 

my %tr = ( # setup a translation table 
    '00 00 00 64'  => '<incr4>', 
    '00 00 00 65'  => '<incr4>', 
    '00 30 30 30 30 32' => '<incr6ascii:999999:0>', 
    '31 31 32 32 33 31' => '<incr6ascii:999999:0>', 
); 

for (keys %tr) { 
    my $re = join '\s+' => split; # construct new regex 

    $file =~ s{($re)}{ 
     $1 =~ /\n/ ? "\n$tr{$_}" : $tr{$_} # if octets contained \n, add \n 
    }ge # match multiple times, execute the replacement block as perl code 
} 
print $file; 
+0

好極了!完美的作品...我從來沒有想過使用散列圖,巧妙的解決方案! – John 2010-05-17 21:32:50

+1

+1:很好的解決方案,只是把'/ x'修飾符放在最後! – Zaid 2010-05-18 10:25:50