如何在使用Nokogiri的XML中以遞歸方式刪除特定xpath位置處的空子元素？

我有下面的XML，其中我有幾個空文本的子元素。如何在使用Nokogiri的XML中以遞歸方式刪除特定xpath位置處的空子元素？

doc = <<'XML' 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
      <NickName></NickName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <MiddleName></MiddleName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book> 
XML

我試着用下面的代碼，但它以某種方式無法正常工作。

cust = doc.at_xpath("//Customer") 
cust.each do |cust_obj| 
    if cust_obj.has_text? == false 
     cust_obj.delete 
    end 
end

這有點不正常工作，並給予以下輸出

<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book>

很少具有空文本所得到的元素和他們幾個仍然如此。如何遞歸刪除特定xpath中的元素（使用空數據）並重新編寫XML。

卡在這裏..需要建議。

來源

2012-07-06 user1023627

doc.xpath('//Customer/child::*[not(text())]').each do |node| 
    node.remove 
end

如果您想要刪除沒有孩子的節點，則可以使用not(node())。

EDIT：全工作實施例（使用與上述相同的碼）

require 'nokogiri' 

xml = <<-XML 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode></BookAuthenticationCode> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence></BookSequence> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex></PublisherIndex> 
     <PublisherCategoryQuota></PublisherCategoryQuota> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 
      <MiddleName></MiddleName> 
     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 
      <NickName></NickName> 
     </Customer> 
    </BookPurchaselist> 
</Book> 
XML 

doc = Nokogiri.parse(xml) 

doc.xpath('//Customer/child::*[not(text())]').each do |node| 
    node.remove 
end 

puts doc.to_s

該程序的輸出是：

<?xml version="1.0"?> 
<Book> 
    <BookId>BK45647</BookId> 
    <BookName>The Client by John Grisham</BookName> 
    <BookAuthenticationCode/> 
    <BookCategory>Suspense</BookCategory> 
    <BookSequence/> 
    <BookPublisherInfo> 
     <PublisherId>PBBK12345</PublisherId> 
     <PublisherName>Mc.GrawHill</PublisherName> 
     <PublisherIndex/> 
     <PublisherCategoryQuota/> 
    </BookPublisherInfo> 
    <BookPurchaselist> 
     <Customer> 
      <FirstName>John</FirstName> 
      <LastName>Smith</LastName> 

     </Customer> 
     <Customer> 
      <FirstName>Winston</FirstName> 
      <LastName>Churchill</LastName> 

     </Customer> 
    </BookPurchaselist> 
</Book>

來源

2012-07-06 18:41:19

其不工作..整個節點被得到刪除。 – user1023627 2012-07-08 11:56:51

「整個節點正在被刪除」是什麼意思？是不是你想要的 - 刪除所有'客戶'的空子節點？ – 2012-07-08 12:21:03

如何在使用Nokogiri的XML中以遞歸方式刪除特定xpath位置處的空子元素？

回答

相關問題