我有一張我想分析的HTML表格。我想要移動每個<TR>
並提取href。 的HTML看起來像這樣:如何使用Nokogiri使用CSS選擇器逐行解析?
table id="classified_table" class="vs-classified-table widget-off top" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td id="classified_cell">
<table class="vs-classified-table widget-off" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr id="vs_classified_73634384" class="classified row1 kiwii-clad-row kiwii-clad-featured">
<tr id="vs_classified_74530668" class="classified row2 kiwii-clad-row kiwii-clad-featured">
<tr id="vs_classified_62296263" class="classified row3 kiwii-clad-row kiwii-clad-featured">
<tr id="vs_classified_62468547" class="classified row4 kiwii-clad-row kiwii-clad-featured">
<tr id="vs_classified_47122034" class="classified row5 kiwii-clad-row kiwii-clad-featured">
<tr id="vs_classified_78210646" class="classified row6 kiwii-clad-row">
<tr id="vs_classified_78207083" class="classified row7 kiwii-clad-row">
<tr id="vs_classified_69104369" class="classified row8 kiwii-clad-row">
<tr id="vs_classified_78113204" class="classified row9 kiwii-clad-row">
<tr id="vs_classified_52761813" class="classified row10 kiwii-clad-row">
<tr id="vs_classified_78121746" class="classified row11 kiwii-clad-row">
<tr id="vs_classified_76515548" class="classified row12 kiwii-clad-row">
<tr id="vs_advert_middle" class="vs-advertisement advertisment-middle-2 vs-adsense-middle-BR-" style="border:none">
<tr id="vs_classified_34048811" class="classified row13 kiwii-clad-row">
我的Ruby代碼看起來是這樣的:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open('http://servico-informatica.vivanuncios.com/computador+rio-de-janeiro-capital/'))
rows = page.css('tr#vs_classified_73634384.classified td.summary div a#vs-detail-link-1.kiwii-clear-none')
puts rows.text
#this works
rows [1..10].each do |row|
puts "this isn't working :("
end
第一個成功地打印打印第一<TR>
的文本,但puts
的each
循環內不工作。
我想刮的頁面是:http://servico-informatica.vivanuncios.com/computador+rio-de-janeiro-capital/
你可以發佈一個你正在尋找什麼樣的輸出方面的例子嗎?你想要一個只有鏈接的數組?你想要鏈接中的文字嗎? – miah
您的HTML樣本無效且缺少HREF,並且您尚未指定您對鏈接感興趣的HREF。 –