2014-07-18 99 views
0

我有以下的HTML字符串:解析字符串來獲取內容

<h3>I only want this content</h3> I don't want this content <b>random content</b> 

而且我想只得到從H3標籤的內容和刪除其他內容。我有以下內容:

String getArticleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String getSubHeadlineFromArticle; 

if(getArticleBody.startsWith("<h3>") && getArticleBody.endsWith("</h3>")){ 
    mainArticle.append(getSubHeadlineFromArticle); 
} 

但是,這返回了整個內容,這不是我所追求的。如果有人能幫助我,那將是非常感謝。

+0

你需要存儲的內容。 –

+0

請參閱:http://stackoverflow.com/questions/16597303/extract-string-between-two-strings-in-java –

回答

0

您可以使用子方法是這樣 -

String a="<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
System.out.println(a.substring(a.indexOf("<h3>")+4,a.indexOf("</h3>"))); 

輸出 -

I only want this content 
0

與此

String result = getArticleBody.substring(getArticleBody.indexOf("<h3>"), getArticleBody.indexOf("</h3>")) 
       .replaceFirst("<h3>", ""); 
System.out.println(result); 
0

您需要使用正則表達式這樣的嘗試:

public static void main(String[] args) { 
    String str = "<h3>asdfsdafsdaf</h3>dsdafsdfsafsadfa<h3>second</h3>"; 
    // your pattern goes here 
    // ? is important since you need to catch the nearest closing tag 
    Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
    Matcher matcher = pattern.matcher(str); 
    while (matcher.find()) System.out.println(matcher.group(1)); 
} 

matcher.group(1)在h3標籤之間返回完全文本。

0

使用正則表達式
它可以幫助你:

String str = "<h3>I only want this content</h3> I don't want this content <b>random content</b>"; 
final Pattern pattern = Pattern.compile("<h3>(.+?)</h3>"); 
final Matcher matcher = pattern.matcher(str); 
matcher.find(); 
System.out.println(matcher.group(1)); // Prints String I want to extract 

輸出:

I only want this content 
1

謝謝,夥計們。你所有的答案都有效,但我最終使用了Jsoup。

String getArticleBody = listArt.getChildText("body"); 
org.jsoup.nodes.Document docc = Jsoup.parse(getArticleBody); 
org.jsoup.nodes.Element h3Tag = docc.getElementsByTag("h3").first(); 
String getSubHeadlineFromArticle = h3Tag.text(); 
0

其他答案已經涵蓋了如何得到你想要的結果。我要評論你的代碼,解釋爲什麼它沒有這樣做。 (請注意,我修改您的變量的名字,因爲字符串沒有得到任何東西;他們的事情。)

// declare a bunch of variables 
String articleBody = listArt.getChildText("body"); 
StringBuilder mainArticle = new StringBuilder(); 
String subHeadlineFromArticle; 

// check to see if the article body consists entirely of a subheadline 
if(articleBody.startsWith("<h3>") && articleBody.endsWith("</h3>")){ 
    // if it does, append an empty string to the StringBuilder 
    mainArticle.append(subHeadlineFromArticle); 
} 
// if it doesn't, don't do anything 

// final result: 
// articleBody = the entire article body 
// mainArticle = empty StringBuilder (regardless of whether you appended anything) 
// subHeadlineFromArticle = empty string