提取文本

我有以下的HTML代碼：提取文本

<div class=example>Text #1</div> "Another Text 1" 
<div class=example>Text #2</div> "Another Text 2"

我想提取以外的標記文本，「另一個文本1」和「其他文本2」

我使用JSoup來實現這一點。

任何想法???

謝謝！

來源

2013-11-09 johnny243

您可以選擇每個div -tag的下一個Node（不是Element！）。在你的例子中，他們都是TextNode's。

final String html = "<div class=example>Text #1</div> \"Another Text 1\"\n" 
        + "<div class=example>Text #2</div> \"Another Text 2\" "; 

Document doc = Jsoup.parse(html); 

for(Element element : doc.select("div.example")) // Select all the div tags 
{ 
    TextNode next = (TextNode) element.nextSibling(); // Get the next node of each div as a TextNode 

    System.out.println(next.text()); // Print the text of the TextNode 
}

輸出：

"Another Text 1" 
"Another Text 2"

來源

2013-11-10 18:51:12 ollo

非常感謝！ – johnny243

一個解決方案是使用ownText()方法（請參閱Jsoup docs）。此方法僅返回指定元素擁有的文本，並忽略其直接子元素擁有的任何文本。

只使用你提供的HTML，你可以提取<body> owntext：

String html = "<div class='example'>Text #1</div> 'Another Text 1'<div class='example'>Text #2</div> 'Another Text 2'"; 

Document doc = Jsoup.parse(html); 
System.out.println(doc.body().ownText());

將輸出：

'Another Text 1' 'Another Text 2'

注意，ownText()方法可以在任何Element使用。 docs還有另一個例子。

來源

2013-11-10 00:28:29 ashatte

回答

相關問題