獲取與Jsoup

我試圖找到這種HTML內部的所有元素的所有元素：獲取與Jsoup

<body> 
My text without tag 
<br>Some title</br> 
<img class="image" src="url"> 
My second text without tag 
<p>Some Text</p> 
<p class="MsoNormal">Some text</p> 
<ul> 
<li>1</li> 
<li>2</li> 
</ul> 
</body>

我需要得到所有的元素包括，無任何標記。如何可以得到它？

P.S.：我需要爲每個元素獲取「元素」數組。

來源

2015-06-10 aef67

*部件不帶標籤*仍然在一些標籤，是不是？ –

你如何看待這個「*」元素「*」應該看起來像？其內容應該是什麼？ – Pshemo

@Pshemo我認爲這可以是「元素」類與所有這些元素 – aef67

不太確定您是否要求檢索html中的所有文本。要做到這一點，你可以簡單地做到以下幾點：

String html; // your html code 
Document doc = Jsoup.parse(html); //parse the string 
System.out.println(doc.text()); // get all the text from tags.

OUTPUT：

我的文字沒有標籤的一些標題我的第二個文本沒有標籤的某些文本一些文本1 2

來源

2015-06-10 10:58:48 nafas

爲了防止使用html文件，您可以使用下面的代碼並檢索您需要的每個標籤。 API是Jsoup。你可以在下面的鏈接中找到更多的例子http://jsoup.org/

File input = new File(htmlFilePath); 

InputStream is = new FileInputStream(input); 

String html = IOUtils.toString(is); 

Document htmlDoc = Jsoup.parse(html); 

Elements pElements = htmlDoc.select("P"); 

Element pElement1 = pElements.get(0);

來源

2015-06-10 11:11:53

回答

相關問題