1
我在我的java應用程序中使用jsoup來解析html代碼,但現在我需要解析表數據,並且我想獲得第一個<td>
元素的第一個值,在<tr>
之後,如果第一個數據包含單詞「過期」它將跳過,如果沒有過期,它將解析到第三個表格,並以「.rpm」單詞獲得該值,並且無法使其工作。我嘗試了很多方法,但都不成功,所以如果有人有經驗,我想在這裏嘗試運氣。在Java中使用jsoup的解析元素
public class rpms {
public static void getTdSibling(String sourceTd) throws FileNotFoundException, UnsupportedEncodingException {
String fragment = sourceTd;
Document doc = Jsoup.parseBodyFragment(fragment);
Elements myElements = doc.getElementsByClass("confluenceTable tablesorter").first().getElementsByTag("tr");
for (Element element : myElements) {
if (element.select("td").contains("Outdated")) {
String rpms = element.ownText();
System.out.println(rpms);
}
}
}
public static void main(String[] args) {
URLget rpms = new URLget();
try {
getTdSibling(sendGetRequest(URL).toString());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
並請參閱下表中的HTML代碼中元素的解析情況如下:
<table class="confluenceTable tablesorter">
<tbody class="">
<tr>
<td colspan="1" class="confluenceTd">RHSA-2014:1172</td>
<td colspan="1" class="confluenceTd">
<p>The procmail program is used for local mail delivery. In addition to just
<br>delivering mail, procmail can be used for automatic filtering, presorting,
<br>and other mail handling jobs.</p>
<p>A heap-based buffer overflow flaw was found in procmail's formail utility.
<br>A remote attacker could send an email with specially crafted headers that,
<br>when processed by formail, could cause procmail to crash or, possibly,
<br>execute arbitrary code as the user running formail. (CVE-2014-3618)
</p>
</td>
<td colspan="1" class="confluenceTd">procmail-3.22-17.1.2.x86_64.rpm</td>
<td colspan="1" class="confluenceTd">
<img class="emoticon emoticon-tick" src="/s/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.57/_/images/icons/emoticons/check.png" data-emoticon-name="tick" alt="(tick)">
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">Outdated RHSA-2014:1166</td>
<td colspan="1" class="confluenceTd">
<p>Jakarta Commons HTTPClient implements the client side of HTTP standards.</p>
<p>It was discovered that the HTTPClient incorrectly extracted host name from
<br>an X.509 certificate subject's Common Name (CN) field. A man-in-the-middle
<br>attacker could use this flaw to spoof an SSL server using a specially
<br>crafted X.509 certificate. (CVE-2014-3577)</p>
</td>
<td colspan="1" class="confluenceTd">
<p>jakarta-commons-httpclient-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-demo-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-javadoc-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-manual-3.0-7jpp.4.el5_10.x86_64.rpm</p>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">RHSA-2014:1148-1</td>
<td colspan="1" class="confluenceTd">
<p>A flaw was found in the way Squid handled malformed HTTP Range headers.
<br>A remote attacker able to send HTTP requests to the Squid proxy could use
<br>this flaw to crash Squid. (CVE-2014-3609)
</p>
<p>A buffer overflow flaw was found in Squid's DNS lookup module. A remote
<br>attacker able to send HTTP requests to the Squid proxy could use this flaw
<br>to crash Squid. (CVE-2013-4115)</p>
</td>
<td colspan="1" class="confluenceTd"><span>squid-2.6.STABLE21-7.el5_10.x86_64.rpm</span>
</td>
<td colspan="1" class="confluenceTd"></td>
</tr>
</table>
需要你的幫助。我已經嘗試了很多次,並從這裏閱讀文章,但它不能。謝謝。
你可以修改這個元素'tds:element.getElementsByTag(「td」);'它是錯誤的。 – user3278908 2014-09-24 03:40:37
我的錯字,抱歉。還有一個失蹤的';' – yunandtidus 2014-09-24 07:37:19