2013-03-29 56 views
1

我試圖解析Articulate eLearning課程(imsmanifest.xml)的XML清單文件。使用Perl解析XML文件時遇到困難

XML結構的摘錄提供如下(我想鑽到adlcp:masteryscore):

<?xml version="1.0" encoding="UTF-8"?> 
<manifest xsi:schemaLocation="http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:adlcp="http://www.adlnet.org/xsd/adlcp_rootv1p2" xmlns="http://www.imsproject.org/xsd/imscp_rootv1p1p2" version="1.0" identifier="Electrical_Design_Part_3"> 
    <metadata/> 
    <organizations default="Electrical_Design_Part_3_ORG"> 
     <organization identifier="Electrical_Design_Part_3_ORG"> 
     <title>Electrical Design - Part 3</title> 
     <item identifier="Electrical_Design_Part_3_SCO" identifierref="Articulate_Presenter_RES" isvisible="true"> 
      <title>Electrical Design - Part 3</title> 
      <adlcp:masteryscore>65</adlcp:masteryscore> 
     </item> 
     </organization> 
    </organizations> 
    <resources/> 
</manifest> 

我已經使用XML ::簡單和XML試圖::的libxml。我可以通過簡單的XML文件使這些模塊正常工作,但不需要實際需要解析的清單文件。

下面的代碼顯示了我嘗試使用XML ::的libxml向下鑽取到標題標籤:

use XML::LibXML; 
$filename = "imsmanifest.xml"; 
$parser = XML::LibXML->new(); 
$xmldoc = $parser->parse_file($filename); 

for my $sample ($xmldoc->findnodes('/manifest/organizations/organization/item/title')) { 
    for my $property ($sample->findnodes('./*')) { 
     print $property->nodeName(), ": ", $property->textContent(), "\n"; 
    } 
    print "\n"; 
}; 

一個人怎麼處理的adlcp結腸:masteryscore標籤?每當我嘗試使用這個,我都會遇到一個錯誤 - 但也許我做得不對。

是否有人請告訴我正確的方法來深入adlcp:masteryscore?

非常感謝。

+1

順便說一句,總是使用'​​use strict;使用警告;'!這裏不會有什麼真正的區別,但是你用沒有它的剪刀跑步。 – ikegami

+0

修正格式錯誤的XML – ikegami

回答

3

你問在零到找到名爲manifest元素命名空間,但是您需要http://www.imsproject.org/xsd/imscp_rootv1p1p2名稱空間中名爲manifest的元素。

修正:

use strict; 
use warnings; 

use XML::LibXML    qw(); 
use XML::LibXML::XPathContext qw(); 

my $xml_qfn = 'imsmanifest.xml'; 

my $parser = XML::LibXML->new(no_network => 1); 
my $doc = $parser->parse_file($xml_qfn); 

my $xpc = XML::LibXML::XPathContext->new(); 
$xpc->registerNs(a => "http://www.adlnet.org/xsd/adlcp_rootv1p2"); 
$xpc->registerNs(i => "http://www.imsproject.org/xsd/imscp_rootv1p1p2"); 

for my $item ($xpc->findnodes('/i:manifest/i:organizations/i:organization/i:item', $doc)) { 
    my $title = $xpc->find('i:title/text()', $item); 
    my $mastery = $xpc->find('a:masteryscore/text()', $item); 
    print "$title: $mastery\n"; 
} 

注:前綴用於在的XPath(ai)使用實際的選擇是任意的。您可以選擇任何您想要的內容,就像編寫XML文檔時一樣。

注意:我添加了no_network => 1以防止每次解析XML文檔時libxml讀取DTD。

0

第一步,解決您的例子所以它是格式良好的XML

<?xml version="1.0" encoding="UTF-8"?> 
<manifest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:adlcp="http://www.adlnet.org/xsd/adlcp_rootv1p2" xmlns="http://www.imsproject.org/xsd/imscp_rootv1p1p2" xsi:schemaLocation="http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd" version="1.0" identifier="Electrical_Design_Part_3"> 
    <metadata> 
    <organizations default="Electrical_Design_Part_3_ORG"> 
     <organization identifier="Electrical_Design_Part_3_ORG"> 
     <title>Electrical Design - Part 3</title> 
     <item identifier="Electrical_Design_Part_3_SCO" identifierref="Articulate_Presenter_RES" isvisible="true"> 
      <title>Electrical Design - Part 3</title> 
      <adlcp:masteryscore>65</adlcp:masteryscore> 
     </item> 
     </organization> 
    </organizations> 
    <resources/> 
</metadata> 
</manifest> 

火起來Perl調試

DB<2> use XML::Simple 

    DB<3> $x=XMLin("example.xml") 

    DB<4> x $x 
0 HASH(0x2733c48) 
    'identifier' => 'Electrical_Design_Part_3' 
    'metadata' => HASH(0x2733828) 
     'organizations' => HASH(0x2733288) 
     'default' => 'Electrical_Design_Part_3_ORG' 
     'organization' => HASH(0x272d7e8) 
      'identifier' => 'Electrical_Design_Part_3_ORG' 
      'item' => HASH(0x27285f8) 
       'adlcp:masteryscore' => 65 
       'identifier' => 'Electrical_Design_Part_3_SCO' 
       'identifierref' => 'Articulate_Presenter_RES' 
       'isvisible' => 'true' 
       'title' => 'Electrical Design - Part 3' 
      'title' => 'Electrical Design - Part 3' 
     'resources' => HASH(0x27333d8) 
      empty hash 
    'version' => 1.0 
    'xmlns' => 'http://www.imsproject.org/xsd/imscp_rootv1p1p2' 
    'xmlns:adlcp' => 'http://www.adlnet.org/xsd/adlcp_rootv1p2' 
    'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance' 
    'xsi:schemaLocation' => 'http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd' 

    DB<6> x keys %$x 
0 'xmlns' 
1 'xmlns:xsi' 
2 'identifier' 
3 'version' 
4 'metadata' 
5 'xsi:schemaLocation' 
6 'xmlns:adlcp' 
    DB<9> x keys %{$x->{metadata}} 
0 'resources' 
1 'organizations' 
    DB<10> x keys %{$x->{metadata}{organizations}} 
0 'default' 
1 'organization' 
    DB<11> x keys %{$x->{metadata}{organizations}{organizations} 
Missing right curly or square bracket at (eval 22)[/usr/share/perl/5.14/perl5db.pl:640] line 4, at end of line 
syntax error at (eval 22)[/usr/share/perl/5.14/perl5db.pl:640] line 4, at EOF 
    DB<12> x keys %{$x->{metadata}{organizations}{organizations}} 
    empty array 
    DB<13> x keys %{$x->{metadata}{organizations}{organization}} 
0 'identifier' 
1 'item' 
2 'title' 
    DB<14> x keys %{$x->{metadata}{organizations}{organization}{item}} 
0 'identifier' 
1 'identifierref' 
2 'isvisible' 
3 'title' 
4 'adlcp:masteryscore' 
    DB<19> x $x->{metadata}{organizations}{organization}{item}{'adlcp:masteryscore'} 
0 65 
    DB<20> 

因此,所有你需要做的就是

use XML::Simple; 
$x=XMLIN("example.xml"); 
print $x->{metadata}{organizations}{organization}{item}{'adlcp:masteryscore'}; 

希望這有助於

+0

只要添加第二個組織或項目,就會失敗。如果給定的文檔使用'foo:masteryscore'而不是'adlcp:masteryscore',那麼這會失敗,這是完全可以接受的。你永遠不應該依賴前綴,而只需要命名空間。 XML :: Simple是最難使用的XML解析器。 – ikegami

0

XML是無效的,你需要一個XML後關閉標籤的元數據和資源

::簡單的將與此代碼的工作

#!/usr/bin/env perl 

use strict; 
use warnings; 
use XML::Simple; 
use Data::Dumper; 


use XML::Simple qw(:strict); 

my $ref = XMLin('test.xml',ForceArray => [], KeyAttr => {}); 
print STDERR Dumper $ref; 
+0

這對XML :: Simple非常困難。它認爲'foo:masteryscore'與'adlcp:masteryscore'不同,當它們很可能相同時。 XML :: Simple是最難使用的XML解析器。 – ikegami