2013-07-24 130 views
1

我試圖讓從RSS源全光照perl的,XML DOM ::和XML SOM信息::分析器。 我有一個困難時期對XML越來越SOM文檔:: DOM和XML解析器:: :(Perl的XML :: DOM解析器::

這是RSS提要支出。

<rss version="2.0"> 
<channel> 
    <item> 
     <title>The title numer 1</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&sha=1234567890 
     </link> 
     <description> 
     File 1 
     </description> 
    </item> 
    <item> 
     <title>The title numer 2</title> 
     <link> 
     http://www.example.com/link1.php?getfile=2&sha=0192837465 
     </link> 
     <description> 
     File 2 
     </description> 
    </item> 
     <item> 
     <title>The title numer 3</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&sha=0987654321 
     </link> 
     <description> 
     File 3 
     </description> 
    </item> 
</channel> 

所以我想獲得「冠軍」,並從這個RSS提要的「鏈接」。

我不能使用XML ::的libxml或XML ::簡單或XML :: RSS

回答

1

我得到的錯誤嘗試安裝它,但它看起來像這樣:

use XML::DOM::Parser qw(); 
use XML::XQL   qw(); 
use XML::XQL::DOM qw(); 

my $parser = XML::DOM::Parser->new(); 
my $doc = $parser->parsefile("file.xml"); 

for my $item_node ($doc->xql('/channel/item')) { 
    my $title = join '', $item_node->xql('title/textNode()'); 
    my $link = join '', $item_node->xql('link/textNode()'); 
    ... 
} 
+0

爲什麼downvote? – ikegami

0

解析您的RSS XML文件存在問題。對於文件

<xml> 
<channel> 
    <item> 
     <title>The title numer 1</title> 
     </item> 

    <item> 
     <title>The title numer 2</title> 
     </item> 
</channel> 
</xml> 

你可以做

use strict; 
use warnings; 
use XML::Parser; 
use Data::Dumper; 
use XML::DOM::Lite qw(Parser XPath); 

my $parser = Parser->new(); 
my $doc = $parser->parseFile('2.xml', whitespace => 'strip'); 


#XML::DOM::Lite::NodeList - blessed array ref for containing Node objects 
my $nlist = $doc->selectNodes('/xml/channel/item/title'); 


foreach my $node (@{$nlist}) 
{ 
    print $node->firstChild()->nodeValue() . "\n"; 
} 
0

有一個問題,您的XML數據(不帶引號的 '&' 字符):

線,如

...getfile=1&sha... 

絕寫爲

...getfile=1&amp;sha... 

一旦這是固定的,你可以使用XML ::閱讀:PP解析XML:

use strict; 
use warnings; 

use XML::Reader::PP; 

my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' }, 
    { root => '/rss/channel/item', branch => [ '/title', '/link' ] }); 

while ($rdr->iterate) { 
    my ($title, $link) = $rdr->value; 

    for ($title, $link) { 
     $_ = '' unless defined $_; 
    } 

    print "title = '$title'\n"; 
    print "link = '$link'\n"; 
} 

__DATA__ 
<rss version="2.0"> 
    <channel> 
    <item> 
     <title>The title numer 1</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&amp;sha=1234567890 
     </link> 
     <description> 
     File 1 
     </description> 
    </item> 
    <item> 
     <title>The title numer 2</title> 
     <link> 
     http://www.example.com/link1.php?getfile=2&amp;sha=0192837465 
     </link> 
     <description> 
     File 2 
     </description> 
    </item> 
     <item> 
     <title>The title numer 3</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&amp;sha=0987654321 
     </link> 
     <description> 
     File 3 
     </description> 
    </item> 
    </channel> 
</rss>