2013-02-10 92 views
0

我有幾個表的HTML文件(所有表具有相同的列數和列名相同的)。這些表格由其他HTML標籤分隔。更新列值:: TreeBuilder作爲

對於每個表的每一行,我想改變小區1和小區3

的價值這是我迄今(感謝@depesz):

#!/usr/bin/env perl 
use strict; 
use warnings; 
use utf8; 
use open qw(:std :utf8); 

use HTML::TreeBuilder; 

my $input_file_name = shift; 

my $tree = HTML::TreeBuilder->new(); 
$tree->parse_file($input_file_name) or die "Cannot open or parse $input_file_name\n"; 
$tree->elementify(); 

my @tables = $tree->find_by_tag_name('table'); 
for my $table (@tables) { 
    foreach my $row ($table->find_by_tag_name('tr')) { 
     foreach my $column ($table->find_by_tag_name('td')) { 
      # how do I change the text of first and 3rd column text to "removed" 
     } 
    } 
} 

print $tree->as_HTML(); 
exit; 

它非常適合迭代HTML文件中的所有行。我只是不知道如何做最後一點改變第1列和第3列中的文本。

回答

3

HTML::TreeBuilder::XPath模塊允許更方便地訪問文檔中的HTML節點。

看看這個程序爲例。它似乎做你需要的。

use strict; 
use warnings; 

use HTML::TreeBuilder::XPath; 

my $tree = HTML::TreeBuilder::XPath->new_from_file('anon.html'); 

for my $table ($tree->findnodes('//table')) { 
    my $row = 0; 
    for my $tr ($table->findnodes('//tr')) { 
    $row++; 
    for my $td ($tr->findnodes('td[position() = 1 or position() = 3]')) { 
     $td->delete_content; 
     $td->push_content("name$row"); 
    } 
    } 
} 

print $tree->as_HTML('<>&', ' '); 
+0

就像一個魅力。謝謝! – smithy 2013-02-10 15:51:14