我想從Worldbank站點的XML DB文件中使用XML :: XPath解析器獲取數據。問題是我沒有看到輸出結果。我必須在代碼中遺漏一些東西。理想情況下,我想提取每個國家XML DB(年份和價值)的死亡率統計數據。我用這個作爲我輸入的一部分:PERL XPath解析器幫助
http://data.worldbank.org/sites/default/files/countries/en/afghanistan_en.xml
use strict;
use LWP 5.64;
use HTML::ContentExtractor;
use XML::XPath;
my $agent1 = LWP::UserAgent->new;
my $extractor = HTML::ContentExtractor->new();
#Retrieve main Worldbank country site
my $mainlink = "http://data.worldbank.org/country/";
my $page = $agent1->get("$mainlink");
my $fulltext = $page->decoded_content();
#Match to just all available countries in Worldbank
my $country = "";
my @countryList;
if (@countryList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/country\/.*?")/gi){
foreach $country(@countryList){
#Remove " at the end of link
$country=~s/\"//gi;
print "\n" . $country;
#Retrieve each country profile's XML DB file
my $page = $agent1->get("$country");
my $fulltext = $page->decoded_content();
my $XML_DB = "";
my @countryXMLDBList;
if (@countryXMLDBList = $fulltext =~ m/(http:\/\/data\.worldbank\.org\/sites\/default\/files\/countries\/en\/.*?\.xml)/gi){
foreach $XML_DB(@countryXMLDBList){
my $page = $agent1->get("$XML_DB");
my $fulltext = $page->decoded_content();
#print $fulltext;
#Use XML XPath parser to find elements related to death rate
my $xp = XML::XPath->new($fulltext); #my $xp = XML::XPath->new("afghanistan_en.xml");
my $nodeSet = $xp->find("//*");
if (!$nodeSet->isa('XML::XPath::NodeSet') || $nodeSet->size() == 0) {
#No match found
print "\nMatch not found!";
exit;
} else {
foreach my $node ($nodeSet->get_nodelist){
print "\n" . $node->find('country')->string_value;
print "\n" . $node->find('indicator')->string_value;
print "\n" . $node->find('year')->string_value;
print "\n" . $node->find('value')->string_value;
exit;
}
}
}
#Build line graph based on death rate statistics and output some image file format
}
}
}
我也在考慮使用XPath表達式「後續兄弟」,但不知道如何正確使用它。例如,我有以下一組XML數據,我只關心在死亡率數據指標後面直接拉扯兄弟姐妹的情況。
<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2006</year>
<value>20.3410000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2007</year>
<value>19.9480000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="SP.DYN.CDRT.IN">Death rate, crude (per 1,000 people)</indicator>
<year>2008</year>
<value>19.5720000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2005</year>
<value>7.0000000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2006</year>
<value>12.0000000</value>
</data>
−
<data>
<country id="AFG">Afghanistan</country>
<indicator id="IC.EXP.DOCS">Documents to export (number)</indicator>
<year>2007</year>
<value>12.0000000</value>
</data>
任何幫助將不勝感激!
你需要指定你的問題是什麼,你'有問題嗎?請編輯您的問題以包含它 – Zaid 2010-05-11 16:51:48
對不起,現在我已經概述了更多的問題! – user338516 2010-05-11 17:46:36
您可能不想使用XML :: XPath - 模塊是老,慢,不再積極維護,我推薦你切換到XML :: LibXML。 API幾乎完全相同,但速度更快,支持更好。 – 2010-05-11 20:57:48