刮Html屬性

<tr valign="middle" align="center"> 
<td><b>someNumbers</b></td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xxxxx</td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xgdsx</td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xyzzx</td> 
<td width="22">&nbsp;</td></tr>

我正在做一個需要從網站的數據的應用程序。我需要提取'someNumbers'中的值以及td ex中的值：'xyzzx'...
我遇到的問題是'someNumbers沒有類，所以我嘗試使用
doc.getElementsByAttributeValue(key, value)
但該文檔的其他部分的屬性相同。我怎樣才能使用JSoup或任何其他明智的想法提取這些值？感謝您的任何建議。刮Html屬性

來源

2012-12-22 wtsang02

你可以選擇所有的'td'並只獲取文本內容嗎？ – nhahtdh

我可以選擇td標籤。但是，這將導致1k結果，我只使用'someNumbers'將很難區分的30％。但生病嘗試。 – wtsang02

Document.select(...);這是什麼方法做，我們就可以使用「CSS選擇器」像td.class或tr td #id，只是使用它們，就好像他們在這個article在Jsoup CSS選擇器。

來源

2012-12-22 19:00:10 wtsang02

-1

<td[^<]+?>*</[^<]+?>使用這個作爲正則表達式，並將其存儲陣列中的全部

然後通過除去<td[^<]+?>，然後將此</[^<]+?>刪除每一個。

來源

2012-12-22 18:33:13

-1。 OP已經在使用正確的HTML解析器。 – nhahtdh

請閱讀[本]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags） – wtsang02

回答

相關問題