如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

我使用Rails 4.2.7與Ruby（2.3）和Nokogiri。我如何找到桌子上最直接的tr兒童，而不是嵌套兒童？目前，我發現表格中的表格像這樣...如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

tables = doc.css('table') 
    tables.each do |table| 
    rows = table.css('tr')

這不僅發現表的直接行，

<table> 
    <tbody> 
     <tr>…</tr>

但它還發現行內的行，例如，

<table> 
    <tbody> 
     <tr> 
      <td> 
       <table> 
        <tr>This is found</tr> 
       </table> 
      </td> 
     </tr>

如何優化我的搜索以僅查找直接tr元素？

來源

2016-11-30 Dave

Nokogiri實現了CSS，其中包括一些jQuery擴展，所以熟悉樣式表選擇器的工作方式，並且應該有更好的運氣。 CSS更具可讀性，但XPath更強大，因此瞭解這兩方面都很好。在生成的HTML中很少使用'tbody'標記，但是當您查看頁面HTML時，瀏覽器往往會將它們粘在一起。不要相信瀏覽器，而是直接在命令行中使用'wget'或'curl'或'nokogiri'查看HTML。如果原始HTML包含它，只能使用'tbody'。 –

@Dave：只是好奇：爲什麼你會接受一個答案，而不是upvote呢？ –

您可以使用XPath執行幾個步驟。首先，你需要找到的table的「水平」（即如何嵌套它是在其他表），然後找到所有後代tr有相同數量的table祖先：

tables = doc.xpath('//table') 
tables.each do |table| 
    level = table.xpath('count(ancestor-or-self::table)') 
    rows = table.xpath(".//tr[count(ancestor::table) = #{level}]") 
    # do what you want with rows... 
end

在更一般的情況下，在這裏你可能tr嵌套直接其它tr S，你可以做這樣的事情（這將是無效的HTML，但你可能有XML或其他一些標籤）：

tables.each do |table| 
    # Find the first descendant tr, and determine its level. This 
    # will be a "top-level" tr for this table. "level" here means how 
    # many tr elements (including itself) are between it and the 
    # document root. 
    level = table.xpath("count(descendant::tr[1]/ancestor-or-self::tr)") 
    # Now find all descendant trs that have that same level. Since 
    # the table itself is at a fixed level, this means all these nodes 
    # will be "top-level" rows for this table. 
    rows = table.xpath(".//tr[count(ancestor-or-self::tr) = #{level}]") 
    # handle rows... 
end

第一步可以分爲兩個單獨的查詢，可能更清楚：

first_tr = table.at_xpath(".//tr") 
level = first_tr.xpath("count(ancestor-or-self::tr)")

（如果有表無tr小號雖然，這將失敗，因爲first_tr將nil。上面的組合XPath可以正確處理這種情況。）

來源

2016-11-30 18:37:33 matt

有趣的是，如何使用xpath計數來完成它，但它僅適用於'tag_a/tag_b/tag_a/tag_b'結構（例如'/ table/tr/table/tr'），而不適用於'tag_a/tag_b/tag_b'。 OP說他想要一個普遍的答案。 –

@EricDuminil你可以擴展這種技術來處理像'tag_a/tag_b/tag_b'這樣的情況。這有點複雜，但並不多。 – matt

感謝您的回答。你介意顯示相應的解決方案嗎？我真的不太瞭解xpath，並想學習。 –

我不知道它是否可以直接用css/xpath完成，所以我寫了一個遞歸查找節點的小方法。它一找到就停止遞歸。

xml= %q{ 
<root> 
    <table> 
    <tbody> 
     <tr nested="false"> 
     <td> 
      <table> 
      <tr nested="true"> 
       This is found</tr> 
      </table> 
     </td> 
     </tr> 
    </tbody> 
    </table> 
    <another_table> 
    <tr nested = "false"> 
     <tr nested = "true"> 
    </tr> 
    </another_table> 
    <tr nested = "false"/> 
</root> 
} 

require 'nokogiri' 

doc = Nokogiri::XML.parse(xml) 

class Nokogiri::XML::Node 
    def first_children_found(desired_node) 
    if name == desired_node 
     [self] 
    else 
     element_children.map{|child| 
     child.first_children_found(desired_node) 
     }.flatten 
    end 
    end 
end 

doc.first_children_found('tr').each do |tr| 
    puts tr["nested"] 
end 

#=> 
# false 
# false 
# false

來源

2016-11-30 17:26:01

它沒有。如果HTML是一個表後跟一個tr（中間沒有任何人），它就會失敗。 – Dave

你的例子中不清楚。你有'table/tbody/tr'和'table/tr'嗎？ –

我的問題比這個更多。我如何選擇不嵌套在其他trs中的表格？ – Dave

如何使用Rails和Nokogiri找到直接的孩子，而不是嵌套的孩子？

回答

相關問題