2012-06-05 21 views
1
sub parse_xml{ 
    my $xml_link = $_[0]; 
    my $xml_content = get($xml_link) or warn "Cant get XML page of " . $xml_link . "\n"; 
    if(!$xml_content){ 
     return; 
    } 
    my $xml = XML::Simple->new(KeepRoot => 1); 
    my $xml_data = $xml->XMLin($xml_content); 
    my @items = $xml_data->{rss}{channel}->{item}; 
    # print Dumper($xml_data); 
    foreach my $item (@items) { 
     if($item){ 
      print Dumper($item);    //This is the dump output 
      print $item->{author}; 
      #print $item . "\n"; 
     } 
    } 
} 

當我嘗試輸出項目的作者我只是得到一個屬性或者HASH(Memory Address)not a hash reference at ... line ...你如何輸出使用Perl和XML ::簡單

難道我不當這樣做呢?爲什麼會產生這個錯誤?

這裏是自卸車輸出。

$VAR1 = [ 
      { 
      'link' => 'http://***.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67229', 
      'author' => {}, 
      'title' => 'By: ', 
      'pubDate' => 'Tue, 08 Jun 2010 12:47 EDT', 
      'description' => 'Interesting. At least SHE remembered the product that propelled her to recent recognition. When many people I know have commented on how they loved that Betty White Super Bowl spot, they can't recall the product. Ah, advertising.' 
      }, 
      { 
      'link' => 'http://***.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67167', 
      'author' => {}, 
      'title' => 'By: ', 
      'pubDate' => 'Mon, 07 Jun 2010 13:26 EDT', 
      'description' => 'Fun, fun, fun. A great attitude for all of us to take into our careers.' 
      }, 
      { 
      'link' => 'http://****.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67164', 
      'author' => 'username', 
      'title' => 'By: username', 
      'pubDate' => 'Mon, 07 Jun 2010 12:23 EDT', 
      'description' => 'Her appearance of the Comedy Central roast of William Shattner a couple of years ago was great... it seems like her willingness to be irreverent makes her more appealing to us all! 

www.adverspew.com' 
      }, 
      { 
      'link' => 'http://****.com/article/news/betty-white-credits-snickers-golden-opportunities/144290/#comments-67142', 
      'author' => {}, 
      'title' => 'By: ', 
      'pubDate' => 'Mon, 07 Jun 2010 09:50 EDT', 
      'description' => 'Solid interview. I will definitely be tuning into "Hot in Cleveland" next week. We ought to enjoy Ms. White's talents for as long as we have her. She's great!' 
      } 
     ]; 
+0

此代碼它會是一個更容易回答,如果你從打印'自卸車輸出($項目 - > {}作者)' - 事實上,目前在做所以可能會首先告訴你問題是什麼。 – DVK

+0

由於答案沒有直接說明 - 任何時候當你在Perl中打印表達式的值並獲得'HASH(address)'時,這意味着無論你有什麼是hashref。 – DVK

+0

[XML :: Feed](http://p3rl.org/XML::Feed)存在,不需要編寫這個自定義分析器。 – daxim

回答

1

你是非常正確的軌道。我在這個StackOverflow頁面鏈接的新聞源上使用了你的代碼,並對它進行了微調。

use LWP::Simple; 
use XML::Simple; 
use Data::Dumper; 
sub parse_xml{ 
    my $xml_link = $_[0]; 
    my $xml_content = get($xml_link) or warn "Cant get XML page of " . $xml_link . "\n"; 
    if(!$xml_content){ 
     return; 
    } 
    my $xml = XML::Simple->new(KeepRoot => 1); 
    my $xml_data = $xml->XMLin($xml_content,ForceArray =>'entry'); 
    foreach my $item ($xml_data->{'feed'}[0]->{'entry'}) { 
     foreach my $entry (@{$item}){ 
      if($entry){ 
       print $entry->{'author'}[0]->{'name'}[0]."\n"; 
       print $entry->{'author'}[0]->{'uri'}[0]."\n"; 
      } 
     } 

    } 

} 
parse_xml('http://stackoverflow.com/feeds/question/10906521'); 

對那個例子工作正常。我懷疑你可能試圖打印出一些不是普通值的東西 - 在stackoverflow頁面的例子中,你可以看到'author'實際上包含了一些子節點,所以如果你嘗試打印$ item - > {'author'}在foreach循環中,您將得到您描述的'HASH'結果。

看你的轉儲和鮑羅廷的明智評論,這應該爲你工作:

my $xml_data = $xml->XMLin($xml_content,ForceArray =>'entry'); 
    my $item = $xml_data->{'rss'}[0]->{'channel'}[0]->{'item'}; 
    foreach my $entry (@{$item}){ 
     if($entry){ 
      if(!ref $entry->{'author'}[0]){ 
        print $entry->{'author'}[0]."\n"; 
      } 
      if(!ref $entry->{'description'}[0]){ 
        print $entry->{'description'}[0]."\n"; 
      } 
      if(!ref $entry->{'pubDate'}[0]){ 
        print $entry->{'pubDate'}[0]."\n"; 
      } # etc. 
     } 
1

這個RSS提要可能會或可能不會有每個項目<author>信息。

如果沒有作者,那麼元素仍然出現在XML中,但它沒有內容。它顯示爲<author></author>

XML::Simple將表示這是一個空的匿名散列。

所以,如果有一個項目的作者信息,$item->{author}將是一個簡單的文本字符串。否則它將成爲散列引用。

您可以通過編寫

foreach my $item (@items) { 
    my $author = $item->{author}; 
    $author = '' if ref $author; 
    print "$item\n"; 
} 
+0

謝謝,但我通過檢查$ item是否爲散列表來解決此問題。它似乎不一致,有時會返回散列,有時不會。 – carboncomputed

+0

@ user979663:我看不到'$ item'可以是除哈希引用之外的其他任何東西。你的意思是檢查'$ item - > {author}'?這將是一個字符串或散列索引。我的代碼通過使用'if ref $ author'來檢查它是否是散列引用。 – Borodin