刪除HTML實體及其內容

我必須使用Document doc =jsoup.connect(someUrl).get()和Elements body=doc.select("div.chapter")刪除HTML實體及其內容

String myHtml = " 
<div class="chapter"> 
    <h1>Hello this is my example</h1> 
    <p>This is paragraph one</p> 
    <p>This is paragraph two <sup class="num">Nuisance 1</sup><span class="notes">Nuisance 2</span></p> 
    <p>This is paragraph three</p> 
</div>"

我想刪除<sup> </sup>和<span> <\span>他們與JSOUP內容中提取HTML片段。我讀過使用正則表達式語法是一個壞主意。大多數的例子和答案都解決了這個問題，以去除標籤並保留內容。我想獲得的是：

String newHtml = " 
<div class="chapter"> 
    <h1>Hello this is my example</h1> 
    <p>This is paragraph one</p> 
    <p>This is paragraph two</p> 
    <p>This is paragraph three</p> 
</div>"

我已經使用JSOUP沒有滿意的結果（它使SUP和SPAN實體/標籤）。

來源

2013-08-06 Rod

'not'去除未在指定的選擇返回元素查詢。它不會*進入*到每個元素。 –

請給我們一些努力！ – Niranjan

具有後讀更多（的方式更多！），並嘗試不同的選擇，我已經適應瞭解決我自己的例子：

doc.getElementsByClass("notes").remove(); 
doc.getElementsByClass("num").remove(); 
Elements newElement = doc.select("div.chapter"); 
String newHtml=newElement.toString();

來源

2013-08-07 01:01:38 Rod

也許使用remove後select荷蘭國際集團的sup元素：

doc.select("div > sup").remove();

在那裏，我已經使用了兒童組合子，它適用於您的具體例子。如果它們在div的子元素內，則必須調整選擇器。

來源

2013-08-06 16:30:19

body.select("p > sup.num, p > span.notes").remove(); 
System.out.println(body.html());

應該是完美的你的情況。

來源

2013-08-06 19:52:12 Niranjan

刪除HTML實體及其內容

回答

相關問題