我想解析一個網絡索引程序的HTML文檔。爲此,我使用HTML::TokeParser。不能使用字符串作爲哈希引用..?
我對我的第一個if語句的最後一行得到一個錯誤:
if ($token->[1] eq 'a') {
#href attribute of tag A
my $suffix = $token->[2]{href};
,說Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./indexer.pl line 270, <PAGE_DIR> line 1.
是我的問題是(?後綴或<./a>
)是一個字符串,需要變成一個哈希引用?我查看了其他有類似錯誤的帖子......但我仍然對此一無所知。謝謝你的幫助。
sub parse_document {
#passed from input
my $html_filename = $_[0];
#base url for links
my $base_url = $_[1];
#created to hold tokens
my @tokens =();
#created for doc links
my @links =();
#creates parser
my $p = HTML::TokeParser->new($html_filename);
#loops through doc tags
while (my $token = $p->get_token()) {
#code for retrieving links
if ($token->[1] eq 'a') {
# href attribute of tag A
my $suffix = $token->[2]{href};
#if href exists & isn't an email link
if (defined($suffix) && !($suffix =~ "^mailto:")) {
#make the url absolute
my $new_url = make_absolute_url $base_url, $suffix;
#make sure it's of the http:// scheme
if ($new_url =~ "^http://"){
#normalize the url
my $new_normalized_url = normalize_url $new_url;
#add it to links array
push(@links, $new_normalized_url);
}
}
}
#code for text words
if ($token->[0] eq 'T') {
my $text = $token->[1];
#add words to end of array
#(split by non-letter chars)
my @words = split(/\P{L}+/, $text);
}
}
return (\@tokens, \@links);
}
我會打印出一些調試語句,看看到底它認爲令牌要通過數據::自卸車($令牌),也見$ token - > [1]是什麼。這可能是一個'或類似的東西搞亂了價值觀。 – scrappedcola