我被卡住不能不規則地解析嵌入的html標籤。有沒有辦法從節點中刪除所有html標籤並保留所有文本?使用Nokogiri解析內部標籤
我正在使用的代碼:
rows = doc.search('//table[@id="table_1"]/tbody/tr')
details = rows.collect do |row|
detail = {}
[
[:word, 'td[1]/text()'],
[:meaning, 'td[6]/font'],
].collect do |name, xpath|
detail[name] = row.at_xpath(xpath).to_s.strip
end
detail
end
使用XPath:
[:meaning, 'td[6]/font']
產生
:meaning: ! '<font size="3">asking for information specifying <font
color="#CC0000" size="3">what is your name?</font> /what/ as in, <font color="#CC0000" size="3">I'm not sure what you mean</font>
/what/ as in <a style="text-decoration: none;" href="http://somesecretlink.com">what</a></font>
在另一方面,使用XPath:
'td/font/text()'
生成
:meaning: asking for information specifying
從而忽略了節點的所有子。我想達到的是這
:meaning: asking for information specifying what is your name? /what/ as in, I'm not sure what you mean /what/ as in what? I can't hear you
我不明白哪來的第一個字體標記被關閉。你試過('td/font')。text? – Roman 2011-05-22 22:15:20
Roman,我糾正了輸出。它確實生成關閉字體標籤。 – PunjCoder 2011-05-22 22:36:07
好吧,你是否嘗試做row.at_xpath('td [6]/font').text? – Roman 2011-05-22 22:49:17