2015-05-14 38 views
0

我有以下HTML代碼,需要使用<span> id來確定「字符串數量」的索引。我正在使用Nokogiri解析HTML並獲得該行。如何使用Nokogiri獲得'th'元素索引

doc = Nokogiri::parse(myfile.html) 
table = doc.xpath("//span[@id='NumStrs']/../../..") 
row = table.xpath["tr[1]"] 

下面是HTML:

<tr> 
<th id ="langframe"> 
<span id="cabinet"> 
Cabinet</span> 
</th> 
<th id ="langbb1"> 
<span id="bb1"> 
BB1</span> 
</th> 
<th id ="langbb2"> 
<span id="bb2"> 
BB2</span> 
</th> 
<th id ="langtemp"> 
<span id="Temp"> 
Temperature</span> 
</th> 
<th id="langstrs"> 
<span id="StringsPresent"> 
Strings Present</span> 
</th> 
<th id="langmstrQty"> 
<span id="NumStrs"> 
Number of Strings</span> 
</th> 
</tr> 
+0

您的預期產出是多少? –

+0

索引關於什麼? 「字符串數量」的索引應該是什麼? –

回答

1

得到它的工作,不知道這是否是有效的方式做到這一點..但它的工作原理

header = table.xpath("tr[1]") 
value = header.xpath("//span[@id='#{id}']").text 
index = header.search('th//text()').collect {|text| text.to_s.strip}.reject(&:empty?).index(value)+1 
1

我會做它使用Ruby的with_index結合select

require 'nokogiri' # => true 

doc = Nokogiri::HTML(<<EOT) 
<tr> 
<th id ="langframe"> 
<span id="cabinet"> 
Cabinet</span> 
</th> 
<th id ="langbb1"> 
<span id="bb1"> 
BB1</span> 
</th> 
<th id ="langbb2"> 
<span id="bb2"> 
BB2</span> 
</th> 
<th id ="langtemp"> 
<span id="Temp"> 
Temperature</span> 
</th> 
<th id="langstrs"> 
<span id="StringsPresent"> 
Strings Present</span> 
</th> 
<th id="langmstrQty"> 
<span id="NumStrs"> 
Number of Strings</span> 
</th> 
</tr> 
EOT 

th_idx = doc.search('th').to_enum.with_index.select { |th, idx| th.text['Number of Strings'] }.first 

返回:

th_idx 
# => [#(Element:0x3fe72d83cd3c { 
#  name = "th", 
#  attributes = [ 
#   #(Attr:0x3fe72d4440f4 { name = "id", value = "langmstrQty" })], 
#  children = [ 
#   #(Text "\n"), 
#   #(Element:0x3fe72d43c3e0 { 
#   name = "span", 
#   attributes = [ 
#    #(Attr:0x3fe72d439b04 { name = "id", value = "NumStrs" })], 
#   children = [ #(Text "\nNumber of Strings")] 
#   }), 
#   #(Text "\n")] 
#  }), 
#  5] 

的指標爲:

th_idx.last # => 5 

一旦你有th_idx,你可以很容易地訪問父或子節點,以瞭解其周圍環境:

th_node = th_idx.first 
th_node['id'] # => "langmstrQty" 
th_node.at('span') 
# => #(Element:0x3fd5110286d8 { 
#  name = "span", 
#  attributes = [ 
#  #(Attr:0x3fd511021b6c { name = "id", value = "NumStrs" })], 
#  children = [ #(Text "\nNumber of Strings")] 
#  }) 
th_node.at('span')['id'] # => "NumStrs" 

with_index爲傳遞給它的每個元素添加一個基於0的索引。 to_enum是必需的,因爲search返回一個NodeSet,它不是枚舉器,因此to_enum返回該值。

如果您希望使用基於1的索引with_index(1)

相關問題