使用perl XML ::的libxml來處理XML，所以慢慢地

XML文件是這樣的：使用perl XML ::的libxml來處理XML，所以慢慢地

<?xml version="1.0" encoding="UTF-8"?> 
<resource-data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="resource-data.xsd"> 
    <class name="AP"> 
    <attributes> 
     <resourceId>00 11 B5 1B 6D 20</resourceId> 
     <lastModifyTime>20130107091545</lastModifyTime> 
     <dcTime>20130107093019</dcTime> 
     <attribute name="NMS_ID" value="DNMS" /> 
     <attribute name="IP_ADDR" value="10.11.141.111" /> 
     <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 20" /> 
    </attributes> 
     <attributes> 
     <resourceId>00 11 B5 1B 6D 21</resourceId> 
     <lastModifyTime>20130107091546</lastModifyTime> 
     <dcTime>20130107093019</dcTime> 
     <attribute name="NMS_ID" value="DNMS" /> 
     <attribute name="IP_ADDR" value="10.11.141.112" /> 
     <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 21" /> 
    </attributes> 
    </class> 
</resource-data>

而且我的代碼：

#!/usr/bin/perl 

use Encode; 
use XML::LibXML; 
use Data::Dumper; 

$parser = new XML::LibXML; 
$struct = $parser->parse_file("d:/AP_201301073100_1.xml"); 

my $file_data = "d:\\ap.txt"; 
open IN, ">$file_data"; 

$rootel = $struct->getDocumentElement(); 
$elname = $rootel->getName(); 

@kids = $rootel->getElementsByTagName('attributes'); 
foreach $child (@kids) { 
    @atts = $child->getElementsByTagName('attribute'); 
    foreach $at (@atts) { 
    $va = $at->getAttribute('value'); 
    print IN encode("gbk", "$va\t"); 
    } 
    print IN encode("gbk", "\n"); 
} 
close(IN);

我的問題是，如果XML文件是隻80MB然後程序會非常快，但是當XML文件大得多時，程序會非常慢。有人可以幫我加快速度嗎？

來源

2013-01-08 John

我認爲[流]（HTTP：// coldattic。 info/shvedsky/pro/blogs/a-foo-walking-into-bar/posts/55）基於解析器推薦用於大型xml文件 –

我只需要如何修改我的程序，你能否幫助我。 – John

btw，使用'open IN，「>：encoding（gbk）」，$ file_data;'而不是編碼遍佈整個地方。 – ikegami

另一種可能性是使用XML::LibXML::Reader。它的工作原理類似於SAX，但使用相同libxml庫作爲XML ::的libxml：

#!/usr/bin/perl 
use warnings; 
use strict; 

use XML::LibXML::Reader; 

my $reader = XML::LibXML::Reader->new(location => '1.xml'); 

open my $OUT, '>:encoding(gbk)', '1.out'; 

while ($reader->read) { 
    attr($reader) if 'attributes' eq $reader->name 
        and XML_READER_TYPE_ELEMENT == $reader->nodeType; 
} 

sub attr { 
    my $reader = shift; 
    my @kids; 
    ATTRIBUTE: 
    while ($reader->read) { 
     my $name = $reader->name; 
     last ATTRIBUTE if 'attributes' eq $name; 
     next ATTRIBUTE if XML_READER_TYPE_END_ELEMENT == $reader->nodeType; 
     push @kids, $reader->getAttribute('value') 
      if 'attribute' eq $name; 
    } 
    print {$OUT} join("\t", @kids), "\n"; 
}

來源

2013-01-08 09:05:45 choroba

謝謝，這種方式非常快，大約一分鐘可以將xml文件轉換爲txt。 – John

如果您的XML文件很大--80MB +，則無法將整個文件解析到內存中 - 首先，它非常緩慢，其次，它最終將耗盡內存，並且程序將崩潰。

我建議使用XML::Twig並使用回調來重寫您的代碼。

來源

2013-01-08 07:06:37 mvp

使用XML::Twig將允許您處理每個在解析過程中遇到的<attributes>元素，然後丟棄不再需要的XML數據。

這個程序似乎是做你所需要的。

use strict; 
use warnings; 

use XML::Twig; 
use Encode; 

use constant XML_FILE => 'S:/AP_201301073100_1.xml'; 
use constant OUT_FILE => 'D:/ap.txt'; 

open my $outfh, '>:encoding(gbk)', OUT_FILE or die $!; 

my $twig = XML::Twig->new(twig_handlers => {attributes => \&attributes}); 
$twig->parsefile('myxml.xml'); 

sub attributes { 
    my ($twig, $atts) = @_; 
    my @values = map $_->att('value'), $atts->children('attribute'); 
    print $outfh join("\t", @values), "\n"; 
    $twig->purge; 
}

輸出

DNMS 10.11.141.111 00 11 B5 1B 6D 20 
DNMS 10.11.141.112 00 11 B5 1B 6D 21

來源

2013-01-08 07:19:54 Borodin

謝謝，我測試過了，它可以運行，但需要更多的時間比choroba的答案。感謝，你們倆輝煌 – John

對於大的XML文件，你必須使用一個基於流的解析器像XML::SAX，因爲DOM解析器在內存中建立整個XML結構。

來源

2013-01-08 07:22:37

另一種方式與XML::Rules：

use strict; 
use warnings; 

use XML::Rules; 
use Data::Dumper; 

my @rules = (
    attribute => [ attributes => sub { print "$_[1]{value}\n"; return } ], 
    _default => undef, 
); 

my $xr = XML::Rules->new(rules => \@rules); 
my $data = $xr->parse($xml);

來源

2013-01-08 18:13:57 runrig

使用perl XML ::的libxml來處理XML，所以慢慢地

回答

相關問題