2012-07-19 65 views
3

我正嘗試在Java中創建一個正則表達式來匹配特定單詞的模式以查找具有相同模式的其他單詞。例如,由於't'和'o'都重複,因此單詞「tooth」具有模式12213。我想要正則表達式匹配其他單詞,如「牙齒」。Java正則表達式與反向引用的交集

所以這裏是我使用反向引用的嘗試。在這個特定的例子中,如果第二個字母與第一個字母相同,它就會失敗。另外,最後一封信應該與其餘的不同。

String regex = "([a-z])([a-z&&[^\1]])\\2\\1([a-z&&[^\1\2]])"; 
Pattern p = Pattern.compile(regex); 
Matcher m = p.matcher("tooth"); 

//This works as expected 
assertTrue(m.matches()); 

m.reset("tooto"); 
//This should return false, but instead returns true 
assertFalse(m.matches()); 

我已經驗證了它的作品像「嘟嘟」如果我刪除了最後一組,即下面的例子,所以我知道的反向引用都工作到了這一點:

String regex = ([a-z])([a-z&&[^\1]])\\2\\1"; 

但如果我將最後一個組添加回模式的末尾,就像它不再識別方括號內的反向引用。

我做錯了什麼,或者這是一個錯誤?

回答

4

如果你打印你的正則表達式,你會得到一個線索是什麼錯誤,你的組中的反向引用實際上被Java逃脫以產生一些奇怪的字符。因此它不能按預期工作。例如:

m.reset("oooto"); 
System.out.println(m.matches()); 

也打印

真正

此外,&&不在正則表達式的工作,你將不得不使用lookahead代替。該表達式適用於你上面的例子:

String regex = "([a-z])(?!\\1)([a-z])\\2\\1(?!(\\1|\\2))[a-z]"; 

表達(?!\\1)展望看,未來charachter是不是在表達的第一個,沒有前進的正則表達式光標。

+0

當我用我的原始正則表達式在「oooto」上運行我的單元測試時,它會返回false,而不是像您說的那樣正確。然而,你建議的正則表達式似乎正在工作,因爲我需要它。謝謝。 :) – beldenge 2012-07-19 13:23:33

4

試試這個:

(?i)\b(([a-z])(?!\2)([a-z])\3\2(?!\3)[a-z]+)\b 

說明

(?i)   # Match the remainder of the regex with the options: case insensitive (i) 
\b    # Assert position at a word boundary 
(    # Match the regular expression below and capture its match into backreference number 1 
    (    # Match the regular expression below and capture its match into backreference number 2 
     [a-z]   # Match a single character in the range between 「a」 and 「z」 
    ) 
    (?!   # Assert that it is impossible to match the regex below starting at this position (negative lookahead) 
     \2    # Match the same text as most recently matched by capturing group number 2 
    ) 
    (    # Match the regular expression below and capture its match into backreference number 3 
     [a-z]   # Match a single character in the range between 「a」 and 「z」 
    ) 
    \3    # Match the same text as most recently matched by capturing group number 3 
    \2    # Match the same text as most recently matched by capturing group number 2 
    (?!   # Assert that it is impossible to match the regex below starting at this position (negative lookahead) 
     \3    # Match the same text as most recently matched by capturing group number 3 
    ) 
    [a-z]   # Match a single character in the range between 「a」 and 「z」 
     +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
) 
\b    # Assert position at a word boundary 

代碼

try { 
    Pattern regex = Pattern.compile("(?i)\\b(([a-z])(?!\\2)([a-z])\\3\\2(?!\\3)[a-z]+)\\b"); 
    Matcher regexMatcher = regex.matcher(subjectString); 
    while (regexMatcher.find()) { 
     for (int i = 1; i <= regexMatcher.groupCount(); i++) { 
      // matched text: regexMatcher.group(i) 
      // match start: regexMatcher.start(i) 
      // match end: regexMatcher.end(i) 
     } 
    } 
} catch (PatternSyntaxException ex) { 
    // Syntax error in the regular expression 
} 

看到播放here。希望這可以幫助。

+0

**你是對的,反向引用在字符類中不起作用。但是沒有必要大聲呼喊**;) – 2012-07-19 05:22:20

+0

對不起,但它是一個衝動,知道是否有什麼*真的*是* *我不知道! – Cylian 2012-07-19 05:23:43