我希望得到Perl問題的一些幫助。在PERL中使用LWP :: UserAgent下載XML結果
我需要下載一個查詢結果的XML文件,解析結果,從XML文件中抓取下一個鏈接,重複下載&。
我已經能夠下載和分析的第一個結果集的罰款。
我抓住下一個URL,但似乎返回的結果不會改變。 I.e .:通過循環第二次,$res->content
與第一次相同。因此,$url
的值在第一次下載後永遠不會改變。
我懷疑這是一個範圍問題,但我似乎無法得到這個句柄。
use LWP::UserAgent;
use HTTP::Cookies;
use Data::Dumper;
use XML::LibXML;
use strict;
my $url = "http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c=bhlead&cc=bhlead&type=simple&rgn=Entire+Finding+Aid&q1=civil+war&Submit=Search;debug=xml";
while ($url ne ""){
my $ua = LWP::UserAgent->new();
$ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)');
$ua->timeout(30);
$ua->default_header('pragma' => "no-cache", 'max-age' => '0');
print "Download URL:\n$url\n\n";
my $res = $ua->get($url);
if ($res->is_error) {
print STDERR __LINE__, " Error: ", $res->status_line, " ", $res;
exit;
}
my $parser = XML::LibXML->new();
my $doc = $parser->load_xml(string=>$res->content);
#grab the url of the next result set
$url = $doc->findvalue('//ResultsLinks/SliceNavigationLinks/NextHitsLink');
print "NEXT URL:\n$url\n\n";
}
你從`print`行得到什麼輸出? – cjm 2011-02-15 06:22:48
下載網址: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c = bhlead&cc = bhlead&type = simple&rgn = Entire + Finding + Aid&q1 = civil + war&Submit = Search; debug = xml 下載地址: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx? c = bhlead; cc = bhlead; type = simple; rgn = Entire%20Finding%20Aid; q1 = civil%20war; debug = xml; view = reslist; subview = short; sort = occur; start = 26; size = 25 NEXT URL: http://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?c=bhlead;cc=bhlead;type=simple;rgn=Entire%20Finding%20Aid;q1=civil %20war; debug = xml; view = reslist; subview = short; sort = occur; start = 26; size = 25 – Matt 2011-02-15 14:17:49