解析字符串來獲取內容

我有以下的HTML字符串：解析字符串來獲取內容

<h3>I only want this content</h3> I don't want this content <b>random content</b>

而且我想只得到從H3標籤的內容和刪除其他內容。我有以下內容：

String getArticleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String getSubHeadlineFromArticle; 

if(getArticleBody.startsWith("<h3>") && getArticleBody.endsWith("</h3>")){ 
    mainArticle.append(getSubHeadlineFromArticle); 
}

但是，這返回了整個內容，這不是我所追求的。如果有人能幫助我，那將是非常感謝。

來源

2014-07-18 Ergun Polat

你需要存儲的內容。 –

請參閱：http://stackoverflow.com/questions/16597303/extract-string-between-two-strings-in-java –

您可以使用子方法是這樣 -

String a="<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
System.out.println(a.substring(a.indexOf("<h3>")+4,a.indexOf("</h3>")));

輸出 -

I only want this content

來源

2014-07-18 11:06:47

與此

String result = getArticleBody.substring(getArticleBody.indexOf("<h3>"), getArticleBody.indexOf("</h3>")) 
       .replaceFirst("<h3>", ""); 
System.out.println(result);

來源

2014-07-18 11:07:12

您需要使用正則表達式這樣的嘗試：

public static void main(String[] args) { 
    String str = "<h3>asdfsdafsdaf</h3>dsdafsdfsafsadfa<h3>second</h3>"; 
    // your pattern goes here 
    // ? is important since you need to catch the nearest closing tag 
    Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
    Matcher matcher = pattern.matcher(str); 
    while (matcher.find()) System.out.println(matcher.group(1)); 
}

matcher.group(1)在h3標籤之間返回完全文本。

來源

2014-07-18 11:08:22 ferrerverck

使用正則表達式
它可以幫助你：

String str = "<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
final Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
final Matcher matcher = pattern.matcher(str); 
matcher.find(); 
System.out.println(matcher.group(1)); // Prints String I want to extract

輸出：

I only want this content

來源

2014-07-18 11:08:31

謝謝，夥計們。你所有的答案都有效，但我最終使用了Jsoup。

String getArticleBody = listArt.getChildText("body"); 
org.jsoup.nodes.Document docc = Jsoup.parse(getArticleBody); 
org.jsoup.nodes.Element h3Tag = docc.getElementsByTag("h3").first(); 
String getSubHeadlineFromArticle = h3Tag.text();

來源

2014-07-18 11:12:36

其他答案已經涵蓋了如何得到你想要的結果。我要評論你的代碼，解釋爲什麼它沒有這樣做。（請注意，我修改您的變量的名字，因爲字符串沒有得到任何東西;他們是的事情。）

// declare a bunch of variables 
String articleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String subHeadlineFromArticle; 

// check to see if the article body consists entirely of a subheadline 
if(articleBody.startsWith("<h3>") && articleBody.endsWith("</h3>")){ 
    // if it does, append an empty string to the StringBuilder 
    mainArticle.append(subHeadlineFromArticle); 
} 
// if it doesn't, don't do anything 

// final result: 
// articleBody = the entire article body 
// mainArticle = empty StringBuilder (regardless of whether you appended anything) 
// subHeadlineFromArticle = empty string

來源

2014-07-18 11:25:28 octothorpentine

解析字符串來獲取內容

回答

相關問題