2013-07-11 124 views
10

我在使用Nokogiri時遇到了一些問題。如何使用Nokogiri解析XML文件?

我試圖解析這個XML文件:

<Collection version="2.0" id="74j5hc4je3b9"> 
    <Name>A Funfair in Bangkok</Name> 
    <PermaLink>Funfair in Bangkok</PermaLink> 
    <PermaLinkIsName>True</PermaLinkIsName> 
    <Description>A small funfair near On Nut in Bangkok.</Description> 
    <Date>2009-08-03T00:00:00</Date> 
    <IsHidden>False</IsHidden> 
    <Items> 
    <Item filename="AGC_1998.jpg"> 
     <Title>Funfair in Bangkok</Title> 
     <Caption>A small funfair near On Nut in Bangkok.</Caption> 
     <Authors>Anthony Bouch</Authors> 
     <Copyright>Copyright © Anthony Bouch</Copyright> 
     <CreatedDate>2009-08-07T19:22:08</CreatedDate> 
     <Keywords> 
     <Keyword>Funfair</Keyword> 
     <Keyword>Bangkok</Keyword> 
     <Keyword>Thailand</Keyword> 
     </Keywords> 
     <ThumbnailSize width="133" height="200" /> 
     <PreviewSize width="532" height="800" /> 
     <OriginalSize width="2279" height="3425" /> 
    </Item> 
    <Item filename="AGC_1164.jpg" iscover="True"> 
     <Title>Bumper Cars at a Funfair in Bangkok</Title> 
     <Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption> 
     <Authors>Anthony Bouch</Authors> 
     <Copyright>Copyright © Anthony Bouch</Copyright> 
     <CreatedDate>2009-08-03T22:08:24</CreatedDate> 
     <Keywords> 
     <Keyword>Bumper Cars</Keyword> 
     <Keyword>Funfair</Keyword> 
     <Keyword>Bangkok</Keyword> 
     <Keyword>Thailand</Keyword> 
     </Keywords> 
     <ThumbnailSize width="200" height="133" /> 
     <PreviewSize width="800" height="532" /> 
     <OriginalSize width="3725" height="2479" /> 
    </Item> 
    </Items> 
</Collection> 

我希望所有的顯示在屏幕上的信息,僅此而已。 應該是簡單的吧? 我這樣做:

require 'nokogiri' 

doc = Nokogiri::XML(File.open("sample.xml")) 
@block = doc.css("items item").map {|node| node.children.text} 
puts @block 

每個Items是一個節點,並根據該也有Item子節點?

我創建了一個映射,它返回一個散列,{}中的代碼遍歷每個節點並將子文本放入@block。 然後我可以在屏幕上顯示所有子節點的文本。

我不知道我有多遠或接近,因爲我已經閱讀了很多文章,並且對基礎知識仍然有點困惑,特別是因爲通常使用新語言,我從文件讀取並輸出到一個基本程序的屏幕。

+0

如果您確實有任何問題,請告訴我。我會回答你的。 –

+0

我還有另外一個問題。 http:// stackoverflow。com/questions/17600037/using-nokogiri-to-parse-xml-file 這是關於如何遍歷節點樹。 – camdixon

+0

鏈接到這個職位的問題。 –

回答

26

在這裏,我將試圖解釋你你有所有的問題/困惑:

require 'nokogiri' 

doc = Nokogiri::XML.parse <<-XML 
<Collection version="2.0" id="74j5hc4je3b9"> 
    <Name>A Funfair in Bangkok</Name> 
    <PermaLink>Funfair in Bangkok</PermaLink> 
    <PermaLinkIsName>True</PermaLinkIsName> 
    <Description>A small funfair near On Nut in Bangkok.</Description> 
    <Date>2009-08-03T00:00:00</Date> 
    <IsHidden>False</IsHidden> 
    <Items> 
    <Item filename="AGC_1998.jpg"> 
     <Title>Funfair in Bangkok</Title> 
     <Caption>A small funfair near On Nut in Bangkok.</Caption> 
     <Authors>Anthony Bouch</Authors> 
     <Copyright>Copyright © Anthony Bouch</Copyright> 
     <CreatedDate>2009-08-07T19:22:08</CreatedDate> 
     <Keywords> 
     <Keyword>Funfair</Keyword> 
     <Keyword>Bangkok</Keyword> 
     <Keyword>Thailand</Keyword> 
     </Keywords> 
     <ThumbnailSize width="133" height="200" /> 
     <PreviewSize width="532" height="800" /> 
     <OriginalSize width="2279" height="3425" /> 
    </Item> 
    <Item filename="AGC_1164.jpg" iscover="True"> 
     <Title>Bumper Cars at a Funfair in Bangkok</Title> 
     <Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption> 
     <Authors>Anthony Bouch</Authors> 
     <Copyright>Copyright © Anthony Bouch</Copyright> 
     <CreatedDate>2009-08-03T22:08:24</CreatedDate> 
     <Keywords> 
     <Keyword>Bumper Cars</Keyword> 
     <Keyword>Funfair</Keyword> 
     <Keyword>Bangkok</Keyword> 
     <Keyword>Thailand</Keyword> 
     </Keywords> 
     <ThumbnailSize width="200" height="133" /> 
     <PreviewSize width="800" height="532" /> 
     <OriginalSize width="3725" height="2479" /> 
    </Item> 
    </Items> 
</Collection> 
XML 

從我的引入nokogiri的理解

所以,每個「項目」是一個節點,並根據該有'項目'的孩子節點?

不,每個項目都是Nokogiri::XML::NodeSet。並且在那之下有項目的2個子節點,它們是Nokogiri::XML::Element類對象。你可以說他們也Nokogiri::XML::Node

doc.class # => Nokogiri::XML::Document 
@block = doc.xpath("//Items/Item") 
@block.class # => Nokogiri::XML::NodeSet 
@block.count # => 2 
@block.map { |node| node.name } 
# => ["Item", "Item"] 
@block.map { |node| node.class } 
# => [Nokogiri::XML::Element, Nokogiri::XML::Element] 
@block.map { |node| node.children.count } 
# => [19, 19] 
@block.map { |node| node.class.superclass } 
# => [Nokogiri::XML::Node, Nokogiri::XML::Node] 

我們創建地圖的這一塊,它返回一個哈希我相信,在{}代碼經過的每個節點和孩子們的文字放入@block 。然後我可以將所有這個子節點的文本顯示到屏幕上。

我不明白這一點。雖然我試圖在下面解釋什麼是節點,什麼是節點集Nokogiri。請記住Nodeset節點的集合。

@chld_class = @block.map do |node| 
    node.children.class 
end 
@chld_class 
# => [Nokogiri::XML::NodeSet, Nokogiri::XML::NodeSet] 
@chld_name = @block.map do |node| 
    node.children.map { |n| [n.name,n.class] } 
end 
@chld_name 
# => [[["text", Nokogiri::XML::Text], 
#  ["Title", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Caption", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Authors", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Copyright", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["CreatedDate", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Keywords", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["ThumbnailSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["PreviewSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["OriginalSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text]], 
#  [["text", Nokogiri::XML::Text], 
#  ["Title", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Caption", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Authors", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Copyright", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["CreatedDate", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["Keywords", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["ThumbnailSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["PreviewSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text], 
#  ["OriginalSize", Nokogiri::XML::Element], 
#  ["text", Nokogiri::XML::Text]]] 

@chld_name = @block.map do |node| 
    node.children.map{|n| [n.name,n.text.strip] if n.elem? }.compact 
end.compact 
@chld_name 
# => [[["Title", "Funfair in Bangkok"], 
#  ["Caption", "A small funfair near On Nut in Bangkok."], 
#  ["Authors", "Anthony Bouch"], 
#  ["Copyright", "Copyright © Anthony Bouch"], 
#  ["CreatedDate", "2009-08-07T19:22:08"], 
#  ["Keywords", "Funfair\n  Bangkok\n  Thailand"], 
#  ["ThumbnailSize", ""], 
#  ["PreviewSize", ""], 
#  ["OriginalSize", ""]], 
#  [["Title", "Bumper Cars at a Funfair in Bangkok"], 
#  ["Caption", "Bumper cars at a small funfair near On Nut in Bangkok."], 
#  ["Authors", "Anthony Bouch"], 
#  ["Copyright", "Copyright © Anthony Bouch"], 
#  ["CreatedDate", "2009-08-03T22:08:24"], 
#  ["Keywords", 
#  "Bumper Cars\n  Funfair\n  Bangkok\n  Thailand"], 
#  ["ThumbnailSize", ""], 
#  ["PreviewSize", ""], 
#  ["OriginalSize", ""]]] 
+0

很好的答案! (y) –

+0

@LuizDamimn謝謝你! –

4

樣品XML節點都大寫,所以你的代碼應該反映這一點。例如:

require 'nokogiri' 

doc = Nokogiri::XML(File.open("sample.xml")) 
@block = doc.css("Items Item").map { |node| node.children.text } 
puts @block