2015-03-25 15 views
1

我一直在嘗試使用XML::LibXML模塊拆分XML數據拆分XML文件,但它拋出這樣錯誤嘗試在使用XML ::的libxml模塊

Can't call method "findnodes" without a package or object reference 

我輸入一個錯誤

<xml> 
    <bhap id="1"> 
    <label>cylind - I</label> 
    <title>premier</title> 
    <rect id="S1"> 
     <title>Short</title> 
     <label>1.</label> 
     <p><text>welcome</text></p> 
    </rect> 
    <rect id="S2"> 
     <title>Definite</title> 
     <label>2.</label> 
     <p><text>welcome1</text></p> 
    </rect> 
    </bhap> 
    <bhap id="2"> 
    <label>cylind – II</label> 
    <title>AUTHORITIES AND ITS EMPLOYEES</title> 
    <rect id="S3"> 
     <title>nauty.&#x2014;</title> 
     <label>3.</label> 
     <p><text>welcome3</text></p> 
    </rect> 
    <rect id=S4"> 
     <title>Term</title> 
     <label>4.</label> 
     <p><text>welcome4</text></p> 
    </rect> 
    </bhap> 
</xml> 

輸出需要

文件1

<xml> 
    <bhap id="1"> 
    <label>cylind - I</label> 
    <title>premier</title> 
    <rect id="S1"> 
     <title>Short</title> 
     <label>1.</label> 
     <p><text>welcome</text></p> 
    </rect> 
    </bhap> 
</xml> 

文件2

<xml> 
    <bhap id="1"> 
    <label>cylind - I</label> 
    <title>premier</title> 
    <rect id="S2"> 
     <title>Definite</title> 
     <label>2.</label> 
     <p><text>welcome1</text></p> 
    </rect> 
    </bhap> 
</xml> 

文件3

<xml> 
    <bhap id="2"> 
    <label>cylind – II</label> 
    <title>AUTHORITIES AND ITS EMPLOYEES</title> 
    <rect id="S3"> 
     <title>nauty.&#x2014;</title> 
     <label>3.</label> 
     <p><text>welcome3</text></p> 
    </rect> 
    </bhap> 
</xml> 

文件4

<xml>  
    <bhap id="2"> 
    <label>cylind – II</label> 
    <title>AUTHORITIES AND ITS EMPLOYEES</title> 
    <rect id=S4"> 
     <title>Term</title> 
     <label>4.</label> 
     <p><text>welcome4</text></p> 
    </rect> 
    </bhap> 
</xml> 

我的代碼

use XML::LibXML; 

my $file = shift || die "usage $0 <xmlfile>"; 
my $parser = XML::LibXML->new(); 
my $doc = $parser->parse_file($file); 

my @nodes = $doc->findnodes('//bhap'); 
foreach my $node1 (@nodes) { 

    my $bhap = $node1->toString(), "\n"; 

    if ($bhap =~ m/(<bhap.+?>.+?<\/title>)(.+?)(<\/bhap>)/is) { 

     my $bhap1 = $1; 
     my $bhap2 = $2; 
     my $bhap3 = $3; 

     my $nodes1 = $bhap->findnodes('//rect'); 
     foreach my $node (@$nodes1) { 

      my $rect = $node->toString(); 

      if ($rect =~ m/(<rect\s*id="(.+?)">.+?<\/rect>)/is) { 

       my $var1 = $1; 
       my $var2 = $2; 

       print "file" $var2; 
       print "<xml>" print $bhap1; 
       print $var1; 
       print $bhap3; 
       print "</xml>"; 
      } 
     } 
    } 
} 
+0

是xml_split選項:http://search.cpan.org/dist/XML-Twig/tools/xml_split/xml_split – Sobrique 2015-03-25 09:36:39

+1

您分配給'$ bhap'等,然後從'$ bhap'讀取。使用'使用警告;嚴格使用;'抓住這樣的事情。 – reinierpost 2015-03-25 11:21:07

+1

'my $ nodes1 = $ bhap-> findnodes('// rect');'你在這裏對一個字符串調用'findnodes'。 – nwellnhof 2015-03-25 11:56:00

回答

1

好了,你開始很好,但後來。 ..陷入「正則表達式」陷阱。用正則表達式解析XML並不是一件好事,因爲它太複雜了 - 做得很好,你需要處理/驗證標籤嵌套,換行和各種東西,這些東西基本上只是讓你的正則表達式變得脆弱碼。所以請不要。

但最重要的是 - 在發佈查詢之前始終使用strictwarnings。這是您進行故障排除的第一站。

如果你沒有,你會看到的東西,如:

print "file" $var2; 

這是行不通的 - 在所有。還有一些其他人無法在'你的代碼'中正確工作 - 這將是起點。

此外 - 您的XML無效 - 您的'S4'我認爲缺少一個引號。

無論如何,假設這只是一個錯字,我會用XML::Twig開始(因爲我明白它比的libxml更好,而不是任何具體的原因),並做這樣的事情:

#!/usr/bin/perl 

use strict; 
use warnings; 
use XML::Twig; 

my %children_of; 

#as we process, extract all the 'rect' elements - along with a reference to their context. 
sub process_rect { 
    my ($twig, $rect) = @_; 
    push(@{ $children_of{ $rect->parent } }, $rect->cut); 
} 


my $twig = XML::Twig->new(
    'pretty_print' => 'indented', 
    'twig_handlers' => { 'rect' => \&process_rect }, 

); 

$twig->parse(\*DATA); 

#run through all the 'bhap' elements. 
foreach my $bhap ($twig->root->children('bhap')) { 
    #find the rect elements under this bhap. 
    foreach my $rect (@{ $children_of{$bhap} }) { 
     #create a new XML document - copy the 'root' name from your original document. 
     my $xml = XML::Twig::Elt->new($twig -> root -> name); 
     #duplicate this 'bhap' element by copying it, rather than cutting it, 
     #so we can paste it more than once (e.g. per 'rect') 
     my $subset = $bhap->copy; 
     #insert the 'bhap' into our new xml. 
     $subset->paste(last_child => $xml); 
     #insert our cut rect beneath this bhap. 
     $rect->paste(last_child => $subset); 

     #print the resulting XML. 
     print "--\n"; 
     $xml->print; 
    } 
} 

__DATA__ 
<xml> 

<bhap id="1"> 
       <label>cylind - I</label> 
       <title>premier</title> 
       <rect id="S1"> 
        <title>Short</title> 
        <label>1.</label> 
        <p><text>welcome</text></p> 
       </rect> 
       <rect id="S2"> 
        <title>Definite</title> 
        <label>2.</label> 
        <p><text>welcome1</text></p> 
       </rect> 
     </bhap> 
      <bhap id="2"> 
       <label>cylind - II</label> 
       <title>AUTHORITIES AND ITS EMPLOYEES</title> 

       <rect id="S3"> 
        <title>nauty.&#x2014;</title> 
        <label>3.</label> 
        <p><text>welcome3</text></p> 
       </rect> 

       <rect id="S4"> 
        <title>Term</title> 
        <label>4.</label> 
        <p><text>welcome4</text></p> 
       </rect></bhap> 

</xml> 

我們預處理XML,和'剪掉'rect節點。然後,我們循環訪問bhap節點中的每一個節點 - 複製它們,並在其下插入相關的rect

這給出了輸出:

-- 
<xml> 
    <bhap id="1"> 
    <label>cylind - I</label> 
    <title>premier</title> 
    <rect id="S1"> 
     <title>Short</title> 
     <label>1.</label> 
     <p> 
     <text>welcome</text> 
     </p> 
    </rect> 
    </bhap> 
</xml> 
-- 
<xml> 
    <bhap id="1"> 
    <label>cylind - I</label> 
    <title>premier</title> 
    <rect id="S2"> 
     <title>Definite</title> 
     <label>2.</label> 
     <p> 
     <text>welcome1</text> 
     </p> 
    </rect> 
    </bhap> 
</xml> 
-- 
<xml> 
    <bhap id="2"> 
    <label>cylind - II</label> 
    <title>AUTHORITIES AND ITS EMPLOYEES</title> 
    <rect id="S3"> 
     <title>nauty.â€」</title> 
     <label>3.</label> 
     <p> 
     <text>welcome3</text> 
     </p> 
    </rect> 
    </bhap> 
</xml> 
-- 
<xml> 
    <bhap id="2"> 
    <label>cylind - II</label> 
    <title>AUTHORITIES AND ITS EMPLOYEES</title> 
    <rect id="S4"> 
     <title>Term</title> 
     <label>4.</label> 
     <p> 
     <text>welcome4</text> 
     </p> 
    </rect> 
    </bhap> 
</xml> 

它看起來至少相當接近你想生產什麼。我跳過了閱讀文件和打印內容,因爲重建XML是一個比較難的部分。

我也建議看看xml_split,這是XML::Twig可用,因爲這可能正是你想要的無論如何。

+0

我確定這是所有好的建議,但是所述的問題是沒有包或對象引用的錯誤「無法調用方法」的findnodes「,並且你什麼也沒有說。 – fortboise 2018-03-09 15:15:53

相關問題