如果你的數據真的很規律,你不需要從<a>
元素的屬性,那麼你可以解析每個表格單元格的文本形式,而不必擔心<br>
元素。在chunks
[
[ "Link 1 (info1), Blah 1", "Link 2 (info1), Blah 1", "Link 3 (info2), Blah 1 Foo 2" ],
[ "Link 4 (info1), Blah 2", "Link 5 (info1), Blah 2", "Link 6 (info2), Blah 2 Foo 2" ],
[ "Link 7 (info1), Blah 3", "Link 8 (info1), Blah 3", "Link 9 (info2), Blah 3 Foo 2" ],
[ "Link A (info1), Blah 4", "Link B (info1), Blah 4", "Link C (info2), Blah 4 Foo 2" ]
]
:
鑑於一些HTML像這樣html
:
<table>
<tbody>
<tr>
<td class="j">
<a title="title text1" href="http://link1.com">Link 1</a> (info1), Blah 1,<br>
<a title="title text2" href="http://link2.com">Link 2</a> (info1), Blah 1,<br>
<a title="title text2" href="http://link3.com">Link 3</a> (info2), Blah 1 Foo 2,<br>
</td>
<td class="j">
<a title="title text1" href="http://link4.com">Link 4</a> (info1), Blah 2,<br>
<a title="title text2" href="http://link5.com">Link 5</a> (info1), Blah 2,<br>
<a title="title text2" href="http://link6.com">Link 6</a> (info2), Blah 2 Foo 2,<br>
</td>
</tr>
<tr>
<td class="j">
<a title="title text1" href="http://link7.com">Link 7</a> (info1), Blah 3,<br>
<a title="title text2" href="http://link8.com">Link 8</a> (info1), Blah 3,<br>
<a title="title text2" href="http://link9.com">Link 9</a> (info2), Blah 3 Foo 2,<br>
</td>
<td class="j">
<a title="title text1" href="http://linkA.com">Link A</a> (info1), Blah 4,<br>
<a title="title text2" href="http://linkB.com">Link B</a> (info1), Blah 4,<br>
<a title="title text2" href="http://linkC.com">Link C</a> (info2), Blah 4 Foo 2,<br>
</td>
</tr>
</tbody>
</table>
你可以這樣做:
chunks = doc.search('.j').map { |td| td.text.strip.scan(/[^,]+,[^,]+/) }
,並有這一點。然後你可以把它轉換成你需要的任何哈希表。
這爲我工作。非常感謝! –