我使用以下代碼時遇到了大問題。我希望它會在找到的關鍵字(針)前後返回n個單詞,但它永遠不會。返回文本中給定位置前後的指定字數
如果我有一文,說
"There is a lot of interesting stuff going on, when someone tries to find the needle in the haystack. Especially if there is anything to see blah blah blah".
而且我有這樣的正則表達式:
"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\b)needle(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"
如果這不完全是給定字符串中匹配針,並返回文本
someone tries to find the needle in the haystack. Especially if
它從來沒有:-(在執行,我的方法總是返回一個空字符串,但我絕對知道,該關鍵字在給定的文本內。
private String trimStringAtWordBoundary(String haystack, int wordsBefore, int wordsAfter, String needle) {
if(haystack == null || haystack.trim().isEmpty()){
return haystack ;
}
String textsegments = "";
String patternString = "((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,"+wordsBefore+"}\b)" + needle + "(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,"+wordsAfter+"})";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(haystack);
logger.trace(">>> using regular expression: " + matcher.toString());
while(matcher.find()){
logger.trace(">>> found you between " + matcher.regionStart() + " and " + matcher.regionEnd());
String segText = matcher.group(0); // as well tried it with group(1)
textsegments += segText + "...";
}
return textsegments;
}
很明顯,問題在於我的正則表達式,但我無法弄清楚它有什麼問題。
它看起來並不像你表達內計提空白字符,通常你會使用'\ s'在你有'\ b'的地方,也存在於它之前/之後的字符類中......類似於'「((?:[\ w'\ .-] + \ s){0,」+ wordsBefore + 「})」'和後面的類似... – abiessu 2014-09-30 20:28:44