要做到這一點與HTML::TreeBuilder,你會閱讀文件,修改樹,並寫出來(到同一文件,或不同的文件)。這是相當複雜的,因爲你試圖將文本節點的一部分轉換爲標籤,並且因爲你的評論無法移動。
用HTML樹中常見的成語是使用修改樹遞歸函數:
use strict;
use warnings;
use 5.008;
use File::Slurp 'read_file';
use HTML::TreeBuilder;
sub replace_keyword
{
my $elt = shift;
return if $elt->is_empty;
$elt->normalize_content; # Make sure text is contiguous
my $content = $elt->content_array_ref;
for (my $i = 0; $i < @$content; ++$i) {
if (ref $content->[$i]) {
# It's a child element, process it recursively:
replace_keyword($content->[$i])
unless $content->[$i]->tag eq 'a'; # Don't descend into <a>
} else {
# It's text:
if ($content->[$i] =~ /here/) { # your keyword or regexp here
$elt->splice_content(
$i, 1, # Replace this text element with...
substr($content->[$i], 0, $-[0]), # the pre-match text
# A hyperlink with the keyword itself:
[ a => { href => 'http://example.com' },
substr($content->[$i], $-[0], $+[0] - $-[0]) ],
substr($content->[$i], $+[0]) # the post-match text
);
} # end if text contains keyword
} # end else text
} # end for $i in content index
} # end replace_keyword
my $content = read_file('foo.shtml');
# Wrap the SHTML fragment so the comments don't move:
my $html = HTML::TreeBuilder->new;
$html->store_comments(1);
$html->parse("<html><body>$content</body></html>");
my $body = $html->look_down(qw(_tag body));
replace_keyword($body);
# Now strip the wrapper to get the SHTML fragment back:
$content = $body->as_HTML;
$content =~ s!^<body>\n?!!;
$content =~ s!</body>\s*\z!!;
print STDOUT $content; # Replace STDOUT with a suitable filehandle
從as_HTML
輸出將是語法正確的HTML,但不一定很好地格式化HTML供人觀看的來源。如果需要,可以使用HTML::PrettyPrinter寫出文件。
來源
2010-10-11 00:17:45
cjm
沒有看到你的代碼,很難說出問題出在哪裏。 – Ether 2010-10-10 15:30:54
你可以給出示例HTML行嗎? – Ruel 2010-10-10 15:34:00
我添加了一個例子。 – snoofkin 2010-10-10 18:18:04