2012-10-23 91 views
1

我試圖使用HTML :: TableExtract從HTML文件中提取表格內容。我的問題是我的HTML文件的結構和方式如下:使用Perl提取表格內容

<!DOCTYPE html> 
<html> 
<body> 

    <h4>One row and three columns:</h4> 

    <table border="1"> 
     <tr> 
     <td> 
     <p> 100 </p></td> 
     <td> 
     <p> 200 </p></td> 
     <td> 
     <p> 300 </p></td> 
     </tr> 
     <tr> 
     <td> 
     <p> 100 </p></td> 
     <td> 
     <p> 200 </p></td> 
     <td> 
     <p> 300 </p></td> 
     </tr> 
    </table> 
</body> 
</html> 

由於這種結構,我的輸出是這樣的:

100| 

    200| 

    300| 

    400| 

    500| 

    600| 

而不是我想要的東西:

100|200|300| 
    400|500|600| 

你能幫忙嗎?這裏是我的Perl代碼

use strict; 
use warnings; 
use HTML::TableExtract; 

my $te = HTML::TableExtract->new(); 
$te->parse_file('Table_One.html'); 

open (DATA2, ">TableOutput.txt") 
    or die "Can't open file"; 

foreach my $ts ($te->tables()) { 

    foreach my $row ($ts->rows()) { 

     my $Final = join('|', @$row); 
    print DATA2 "$Final"; 
    } 
} 
close (DATA2); 

回答

1
sub trim(_) { my ($s) = @_; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s } 

或者在Perl 5.14+,

sub trim(_) { $_[0] =~ s/^\s+//r =~ s/\s+\z//r } 

然後使用:

my $Final = join '|', map trim, @$row; 
+0

爲什麼括號? '我的($ s)' – Tim

+0

@Tim N,強制使用列表賦值操作符。否則,它將與'my $ s = 1;'相同。 – ikegami

0

試着這樣做:

use strict; 
use warnings; 
use HTML::TableExtract; 

my $te = HTML::TableExtract->new(); 
$te->parse_file('Table_One.html'); 

open (DATA2, ">TableOutput.txt") or die "Can't open file"; 
foreach my $ts ($te->tables()) 
{ 
    foreach my $row ($ts->rows()) 
    { 
     s/(\n|\s)//g for @$row; 
     my $Final = join('|', @$row); 
     print DATA2 "$Final"; 
    } 
} 
close (DATA2); 
+0

太棒了!謝謝 – user1769222

+0

你可以看看編輯過的問題嗎? – user1769222

1

使用Mojo :: DOM

#!/usr/bin/env perl 

use strict; 
use warnings; 

use Mojo::DOM; 
my $dom = Mojo::DOM->new(<<'END'); 
<!DOCTYPE html> 
<html> 
<body> 

    <h4>One row and three columns:</h4> 

    <table border="1"> 
     <tr> 
     <td> 
     <p> 100 </p></td> 
     <td> 
     <p> 200 </p></td> 
     <td> 
     <p> 300 </p></td> 
     </tr> 
     <tr> 
     <td> 
     <p> 100 </p></td> 
     <td> 
     <p> 200 </p></td> 
     <td> 
     <p> 300 </p></td> 
     </tr> 
    </table> 
</body> 
END 

my $rows = $dom->find('table tr'); 
$rows->each(sub{ 
    print $_->find('td p') 
      ->pluck('text') 
      ->join('|') . "|\n" 
});