Java正則表達式匹配元標記內容屬性值

我有一個正則表達式，並希望它匹配html元標記內容屬性並獲取其內容。例如：Java正則表達式匹配元標記內容屬性值

<meta name="description" content="Some website description.">

在這種情況下獲得

Some website description.

，僅此而已。在我來說，我使用這個模式：

private static Pattern siteMetaTagDescriptionAttributePattern = Pattern.compile("name=\"description\"(\\s*)content=\"(.*)\""); 
Matcher matcher = siteMetaTagDescriptionAttributePattern.matcher(siteContentLine); 
String siteDescription = ""; 
while(matcher.find()) { 
    siteDescription = matcher.group(2); 
}

和獲取到行的末尾，在這種情況下，這樣的：

Some website description.">

我應該怎麼做才能內容的唯一內部內容屬性，在這種情況下，

Some website description.

非常感謝。

來源

2014-02-10 George

考慮使用Jsoup，如果你是在這裏頁面中提取數據，並在那裏。 – nhahtdh

[Obligatory「不要這樣做」鏈接]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags）。 –

嗨，喬治。請參閱@BoristheSpider鏈接。使用正則表達式很難匹配HTML;然而，你可以在你的表達式結尾嘗試*（「>）'來看看它是否有效。 –

考慮使用分析器而不是正則表達式。您可以使用例如Jsoup像

String html = "<meta name=\"description\" content=\"Some website description.\">"; 

Document doc =Jsoup.parse(html); 
System.out.println(doc.select("meta[name=description]").attr("content"));

輸出：

Some website description.

來源

2014-02-10 20:24:54 Pshemo

@downvoter如果你發現這個答案有什麼問題，你可以告訴它是什麼嗎？我想改進/糾正它。 – Pshemo

如果你堅持：

(?<=name=\"description\" content=\")[^\"]*(?=\")

來源

2014-02-10 20:25:02 tenub

很好的正則表達式，但我認爲你需要轉義字符集中的「^」 ]''也'因爲他有用雙引號括起來的模式，我想它應該是這樣的：'Pattern.compile（「（？<= name = \」description \「content = \」）[^ \「] *（= \？「）」）;' – MElliott

Java正則表達式匹配元標記內容屬性值

回答

相關問題