2012-07-06 12 views
1

我有下面的XML,其中我有幾個空文本的子元素。如何在使用Nokogiri的XML中以遞歸方式刪除特定xpath位置處的空子元素?

doc = <<'XML' 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
      <NickName></NickName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <MiddleName></MiddleName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book> 
XML 

我試着用下面的代碼,但它以某種方式無法正常工作。

cust = doc.at_xpath("//Customer") 
cust.each do |cust_obj| 
    if cust_obj.has_text? == false 
     cust_obj.delete 
    end 
end 

這有點不正常工作,並給予以下輸出

<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book> 

很少具有空文本所得到的元素和他們幾個仍然如此。如何遞歸刪除特定xpath中的元素(使用空數據)並重新編寫XML。

卡在這裏..需要建議。

回答

4
doc.xpath('//Customer/child::*[not(text())]').each do |node| 
    node.remove 
end 

如果您想要刪除沒有孩子的節點,則可以使用not(node())

EDIT:全工作實施例(使用與上述相同的碼)

require 'nokogiri' 

xml = <<-XML 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book> 
XML 

doc = Nokogiri.parse(xml) 

doc.xpath('//Customer/child::*[not(text())]').each do |node| 
    node.remove 
end 

puts doc.to_s 

該程序的輸出是:

<?xml version="1.0"?> 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode/> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence/> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex/> 
     <PublisherCategoryQuota/> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 

     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 

     </Customer> 
    </BookPurchaselist> 
</Book> 
+0

其不工作..整個節點被得到刪除。 – user1023627 2012-07-08 11:56:51

+0

「整個節點正在被刪除」是什麼意思?是不是你想要的 - 刪除所有'客戶'的空子節點? – 2012-07-08 12:21:03

相關問題