如果我理解正確,你正在尋找提取s由雙引號(「)分隔的大腿。你可以使用捕獲組的正則表達式:
String text = "Vulcans are a humanoid species in the fictional \"Star Trek\"" +
" universe who evolved on the planet Vulcan and are noted for their " +
"attempt to live by reason and logic with no interference from emotion" +
" They were the first extraterrestrial species officially to make first" +
" contact with Humans and later became one of the founding members of the" +
" \"United Federation of Planets\"";
String[] entities = new String[10]; // An array to hold matched substrings
Pattern pattern = Pattern.compile("[\"](.*?)[\"]"); // The regex pattern to use
Matcher matcher = pattern.matcher(text); // The matcher - our text - to run the regex on
int startFrom = text.indexOf('"'); // The index position of the first " character
int endAt = text.lastIndexOf('"'); // The index position of the last " character
int count = 0; // An index for the array of matches
while (startFrom <= endAt) { // startFrom will be changed to the index position of the end of the last match
matcher.find(startFrom); // Run the regex find() method, starting at the first " character
entities[count++] = matcher.group(1); // Add the match to the array, without its " marks
startFrom = matcher.end(); // Update the startFrom index position to the end of the matched region
}
或寫一個「解析器」與字符串函數:
int startFrom = text.indexOf('"'); // The index-position of the first " character
int nextQuote = text.indexOf('"', startFrom+1); // The index-position of the next " character
int count = 0; // An index for the array of matches
while (startFrom > -1) { // Keep looping as long as there is another " character (if there isn't, or if it's index is negative, the value of startFrom will be less-than-or-equal-to -1)
entities[count++] = text.substring(startFrom+1, nextQuote); // Retrieve the substring and add it to the array
startFrom = text.indexOf('"', nextQuote+1); // Find the next " character after nextQuote
nextQuote = text.indexOf('"', startFrom+1); // Find the next " character after that
}
在這兩個,樣本文本是硬編碼的緣故示例和相同的變量被假定爲存在(字符串變量名爲text
)。
如果你想測試entities
數組的內容:
int i = 0;
while (i < count) {
System.out.println(entities[i]);
i++;
}
我不得不提醒你,有可能是邊境/邊界情況的問題(即當「字是在開始或結束這些例子不會如果「字符的奇偶性不均勻(即如果文本中有」奇數個「字符)的奇偶性,則按預期工作。你可以使用一個簡單的奇偶校驗前手:
static int countQuoteChars(String text) {
int nextQuote = text.indexOf('"'); // Find the first " character
int count = 0; // A counter for " characters found
while (nextQuote != -1) { // While there is another " character ahead
count++; // Increase the count by 1
nextQuote = text.indexOf('"', nextQuote+1); // Find the next " character
}
return count; // Return the result
}
static boolean quoteCharacterParity(int numQuotes) {
if (numQuotes % 2 == 0) { // If the number of " characters modulo 2 is 0
return true; // Return true for even
}
return false; // Otherwise return false
}
注意,如果numQuotes
恰好是0
這種方法仍然返回true
(因爲0模任何數字都是0,所以(count % 2 == 0)
會true
)雖然你止跌「不想與解析先走,如果沒有「字,所以你想不想找個地方檢查此條件。
希望這有助於!
您可能想要在語義分析中考慮各種印刷慣例,而不是剝離標記。如果推斷出您已經明確引用了您想要在其他沒有標記的文本中關聯的短語,是否是正確的? – trashgod 2010-09-04 15:18:01