如何解析xml網頁perl

你好目前我能解析XML文件，如果它保存在我的文件夾從網頁。如何解析xml網頁perl

use strict; 
use warnings; 
use Data::Dumper; 
use XML::Simple; 

my $parser = new XML::Simple; 
my $data = $parser->XMLin("config.xml"); 
print Dumper($data);

但是，如果我試圖從網站解析它，它不起作用。

use strict; 
use warnings; 
use Data::Dumper; 
use XML::Simple; 

my $parser = new XML::Simple; 
my $data = $parser->XMLin("http://website/computers/computers_main/config.xml"); 
print Dumper($data);

它給了我下面的錯誤「文件不存在：http://website/computers/computers_main/config.xml在test.pl第12行」

如何解析從網頁多個XML文件？我必須抓住多個XML形式的網站和解析它。有人可以幫我這個嗎？

來源

2012-05-08 Maxyie

超級編輯：此方法將需要WWW :: Mechanize，但它將允許您登錄到您的網站，然後獲取XML頁面。你將不得不改變評論中的一些東西。希望這可以幫助。

use strict; 
use warnings; 
use Data::Dumper; 
use XML::Simple; 
use WWW::Mechanize; 

# Create a new instance of Mechanize 
$bot = WWW::Mechanize->new(); 
# Create a cookie jar for the login credentials 
$bot->cookie_jar(
     HTTP::Cookies->new(
      file   => "cookies.txt", 
      autosave  => 1, 
      ignore_discard => 1, 
    ) 
); 
# Connect to the login page 
$response = $bot->get('http://www.thePageYouLoginTo.com'); 
# Get the login form 
$bot->form_number(1); 
# Enter the login credentials. 
# You're going to have to change the login and 
# pass(on the left) to match with the name of the form you're logging 
# into(Found in the source of the website). Then you can put your 
# respective credentials on the right. 
$bot->field(login => 'thisIsWhereYourLoginInfoGoes'); 
$bot->field(pass => 'thisIsWhereYourPasswordInfoGoes'); 
$response =$bot->click(); 
# Get the xml page 
$response = $bot->get('http://website/computers/computers_main/config.xml'); 
my $content = $response->decoded_content(); 
my $parser = new XML::Simple; 
my $data = $parser->XMLin($content); 
print Dumper($data);

給這一去。如上所述使用LWP :: Simple。它只是連接到頁面並抓取該頁面的內容（xml文件）並通過XMLin運行。 編輯：添加簡單的錯誤檢查在$ url行。 編輯2：保持代碼在這裏，因爲它應該工作，如果不需要登錄。

use strict; 
use warnings; 
use Data::Dumper; 
use XML::Simple; 
use LWP::Simple; 

my $parser = new XML::Simple; 

my $url = 'http://website/computers/computers_main/config.xml'; 
my $content = get $url or die "Unable to get $url\n"; 
my $data = $parser->XMLin($content); 

print Dumper($data);

來源

2012-05-08 05:03:27 iCanHasFay

嘿，謝謝你的回覆。我試着如上所述，但由於某種原因，我收到錯誤說「無法獲得網址」。任何想法可能會出現問題。我已經正確安裝了兩個模塊。 – Maxyie

Maxyie：http://stackoverflow.com/a/6296843 – daxim

我想這可能是一個URL的錯誤，只是因爲我使用的URL與上面的格式相同，它似乎是爲我工作。你有沒有嘗試過不同的網址？你可以google'filetype：xml someQuery'來獲得一些測試xml文件。只需抓住他們的URL並將它們放在上面的腳本中，這樣我們就可以看到它的URL或腳本。 – iCanHasFay

閱讀XML::Simple的文檔。請注意，XMLin方法可以獲取文件句柄，字符串，甚至是一個IO::Handle對象。它不能採取的是通過HTTP的URL。

使用Perl模塊LWP::Simple來獲取您需要的XML文件並將其傳遞給XMLin。

您必須通過使用cpan來下載和安裝LWP::Simple，正如您以前所做的操作XML::Simple。

來源

2012-05-08 02:46:16

問題？這不是問題。 XML :: Simple也不是。 – ysth

如果您還沒有任何具體的原因要堅持XML :: Simple，然後使用一些其他的解析器，如XML ::嫩枝，XML ::的libxml它提供了一個內置的功能，通過解析現有的XML網頁。

下面是使用相同的XML簡單的代碼::嫩枝

use strict; 
use warnings; 
use XML::Twig; 
use LWP::Simple; 

my $url = 'http://website/computers/computers_main/config.xml'; 
my $twig= XML::Twig->new(); 
$twig->parse(LWP::Simple::get($url));

至於說，XML ::簡單沒有這樣的內置功能。

來源

2012-05-08 05:15:49 rpg

嘿謝謝你的回覆，但與使用XML :: Twig我收到錯誤說：「沒有元素找到第1行，列0，字節-1在/ ur/lib/perl5/site_perl/5/10/i686-cygwin /XML/Parser.pm line 197 at test.pl line 16「有什麼想法可能會出錯？ – Maxyie

如何解析xml網頁perl

回答

相關問題