我一直認爲在Java的正則表達式API(以及許多其他語言)中斷言後面的必須有明顯的長度。因此,在後視內不允許使用STAR和PLUS量詞。在Java中沒有明顯的最大長度的正則表達式後退
卓越網上資源regular-expressions.info似乎也印證了(一些)我的假設:
「[...]的Java需要的東西了一步,允許 有限重複你仍然 無法使用明星。或加分,但你 可以使用問號,並與指定的最大參數 的 大括號內。Java的承認的事實是 有限的重複可以 改寫爲不同的字符串 的交替,但固定的長度。 不幸的是,當您在向後看內部使用 替換時,JDK 1.4和1.5 會有一些錯誤。這些 已在JDK 1.6中修復。 [...]」
使用大括號,只要可以作爲內部查找背後,是小於或等於Integer.MAX_VALUE的所以人物的範圍內的總長度。這些正則表達式是有效的:
"(?<=a{0," +(Integer.MAX_VALUE) + "})B"
"(?<=Ca{0," +(Integer.MAX_VALUE-1) + "})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-2) + "})B"
但這些都不是:
"(?<=Ca{0," +(Integer.MAX_VALUE) +"})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-1) +"})B"
不過,我不明白個E採用:
當我運行使用*內向後看測試和+量詞,一切順利的話(見輸出測試1和測試2)。
但是,當我的向後看開始從測試1和測試2,它打破了添加單個字符(見輸出測試3)。
使從測試貪婪* 3不願意有沒有效果,但仍打破了(見測試4)。
這裏的測試工具:
public class Main {
private static String testFind(String regex, String input) {
try {
boolean returned = java.util.regex.Pattern.compile(regex).matcher(input).find();
return "testFind : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;
} catch(Exception e) {
return "testFind : Invalid -> "+regex+", "+e.getMessage();
}
}
private static String testReplaceAll(String regex, String input) {
try {
String returned = input.replaceAll(regex, "FOO");
return "testReplaceAll : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;
} catch(Exception e) {
return "testReplaceAll : Invalid -> "+regex+", "+e.getMessage();
}
}
private static String testSplit(String regex, String input) {
try {
String[] returned = input.split(regex);
return "testSplit : Valid -> regex = "+regex+", input = "+input+", returned = "+java.util.Arrays.toString(returned);
} catch(Exception e) {
return "testSplit : Invalid -> "+regex+", "+e.getMessage();
}
}
public static void main(String[] args) {
String[] regexes = {"(?<=a*)B", "(?<=a+)B", "(?<=Ca*)B", "(?<=Ca*?)B"};
String input = "CaaaaaaaaaaaaaaaBaaaa";
int test = 0;
for(String regex : regexes) {
test++;
System.out.println("********************** Test "+test+" **********************");
System.out.println(" "+testFind(regex, input));
System.out.println(" "+testReplaceAll(regex, input));
System.out.println(" "+testSplit(regex, input));
System.out.println();
}
}
}
輸出:
********************** Test 1 **********************
testFind : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true
testReplaceAll : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa
testSplit : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]
********************** Test 2 **********************
testFind : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true
testReplaceAll : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa
testSplit : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]
********************** Test 3 **********************
testFind : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^
testReplaceAll : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^
testSplit : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
^
********************** Test 4 **********************
testFind : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^
testReplaceAll : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^
testSplit : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
^
我的問題可能是顯而易見的,但我還是會問它:任何人都可以向我解釋爲什麼測試1和失敗,和測試3和不?我本來以爲他們都會失敗,而不是一半人工作,一半人失敗。
謝謝。
PS。我正在使用:Java版本1.6.0_14
我沒有想到看到Pattern的來源......傻了我。謝謝!它現在非常有意義。 – 2009-10-08 12:03:03
我無法在沒有Eclipse的情況下生活的衆多原因之一是我的按住Ctrl鍵的手指(如果您不使用Eclipse,則意味着「打開定義此名稱的源文件」)。 – 2009-10-08 12:18:17
謝謝,我從來沒有費力將源代碼附加到Eclipse。我現在肯定會這樣做。謝謝。 – 2009-10-08 12:21:20