Perl新手和我正在挖掘我可以做的事以及所有這些偉大的庫的支持和文檔;但是,我正在處理我正在處理的腳本的問題。在實現HTML :: TagFilter之前,我使用第63行(打印FH $ tree-> as_HTML)來打印文件以查找我正在尋找的html內容。我專門尋找身體標記中的所有內容。現在我只想打印出沒有任何屬性的p標籤,h標籤和img標籤。當我運行我的代碼時,文件被創建在正確的目錄中,但是在每個文件中打印一個散列對象(HTML :: Element = HASH(0x3a104c8))。HTML :: TagFilter返回HTML :: Element HASH對象
use open qw(:locale);
use strict;
use warnings qw(all);
use HTML::TreeBuilder 5 -weak; # Ensure weak references in use
use URI::Split qw/ uri_split uri_join /;
use HTML::TagFilter;
my @links;
open(FH, "<", "index/site-index.txt")
or die "Failed to open file: $!\n";
while(<FH>) {
chomp;
push @links, $_;
}
close FH;
my $dir = "";
while($dir eq ""){
print "What is the name of the site we are working on? ";
$dir = <STDIN>;
chomp $dir;
}
#make directory to store files
mkdir($dir);
my $entities = "";
my $indent_char = "\t";
my $filter = HTML::TagFilter->new(
allow=>{ p => { none => [] }, h1 => { none => [] }, h2 => { none => [] }, h3 => { none => [] }, h4 => { none => [] }, h5 => { none => [] }, h6 => { none => [] }, img => { none => [] }, },
log_rejects => 1,
strip_comments => 1
);
foreach my $url (@links){
#print $url;
my ($filename) = $url =~ m#([^/]+)$#;
#print $filename;
$filename =~ tr/=/_/;
$filename =~ tr/?/_/;
#print "\n";
my $currentfile = $dir . '/' . $filename . '.html';
print "Preparing " . $currentfile . "\n" . "\n";
open (FH, '>', $currentfile)
or die "Failed to open file: $!\n";
my $tree = HTML::TreeBuilder->new_from_url($url);
$tree->parse($url);
$tree = $tree->look_down('_tag', 'body');
if($tree){
$tree->dump; # a method we inherit from HTML::Element
print FH $filter->filter($tree);
#print FH $tree->as_HTML($entities, $indent_char), "\n";
} else{
warn "No body tag found";
}
print "File " . $currentfile . " completed.\n" . "\n";
close FH;
}
爲什麼會發生這種情況,以及如何打印我正在查找的實際內容?
謝謝。
真棒!非常感謝你的幫助!我知道我還需要做更多的事情。現在爲什麼該對象是打印對象而不是內容。再次感謝! –