我必須使用Perl解析幾個XML文件並將變量存儲在一個散列中。如果可能的話,我想過濾某些屬性。後來在我的代碼中,我從哈希中提取數據並插入到數據庫中。用Perl解析複雜XML的最佳方式是什麼?
我一直在使用XML::Parser
,但我更喜歡解析爲一個散列,而不是處理它遇到的每個標記。有什麼建議麼?
我想跳過任何具有屬性kind="dir"
的路徑。我需要作者,日期,msg和路徑的文件類型(文件擴展名)。 <path>
標籤可以有任何編號,可以是kind
「文件」或「目錄」。還可以有多個<logentry>
標籤。
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logentry revision="3989">
<author>cergyl</author>
<date>2013-07-19T05:31:01.212620Z</date>
<paths>
<path action="M" kind="dir">/team.admin/trunk/auth.conf</path>
</paths>
<path action="M" kind="file">/team.admin/trunk/file.cpp</path>
<msg>Whitespace change to verify repository synchronization</msg>
</logentry>
</log>
my $XML_Parser = XML::Parser->new(
Handlers => {
Start => \&hdl_xml_tag_start,
End => \&hdl_xml_tag_end,
Char => \&hdl_xml_nonmarkup_char,
Default => \&hdl_xml_default
}
);
# This event is generated when an XML start tag is recognized. Parser is an XML::Parser::Expat instance.
sub hdl_xml_tag_start
{
my ($parser, $element, %attributes) = @_;
$attributes{ '_str' } = "$element:";
$XML_Attributes_Hash_Ref = \%attributes;
return;
}
# This event is generated when an XML end tag is recognized. Note that an XML empty tag (<foo/>) generates both a start and an end event.
sub hdl_xml_tag_end
{
my ($parser, $element) = @_;
#format_message($XML_Attributes_Hash_Ref);
format_svn_history($XML_Attributes_Hash_Ref);
return;
}
# This event is generated when non-markup is recognized. The non-markup sequence of characters is in String.
# A single non-markup sequence of characters may generate multiple calls to this handler.
sub hdl_xml_nonmarkup_char
{
my ($parser, $string) = @_;
$XML_Attributes_Hash_Ref->{ '_str' } .= $string;
return;
}
#This is called for any characters that don't have a registered handler.
sub hdl_xml_default { return; }
爲什麼不'XML :: Parser'爲你工作? – friedo
我真的很喜歡XML :: Twig,不僅僅是因爲它可以讓我「清除」內存空間。 – Sobrique
@friedo,我編輯了我的問題。它的工作原理,但我寧願立即把整個事情作爲一個散列。 – Busch