2012-06-12 204 views
1

我需要檢查針對某些文本的模式(我必須檢查我的模式是否在很多文本中)。Java正則表達式匹配模式

這是我的例子

String pattern = "^[a-zA-Z ]*toto win(\\W)*[a-zA-Z ]*$";  
if("toto win because of".matches(pattern)) 
System.out.println("we have a winner"); 
else 
System.out.println("we DON'T have a winner"); 

對於我的測試,該模式必須匹配,但使用正則表達式我不匹配。 必須匹配:

" toto win bla bla" 

"toto win because of" 
"toto win. bla bla" 


"here. toto win. bla bla" 
"here? toto win. bla bla" 

"here %dfddfd . toto win. bla bla" 

必須不匹配:

" -toto win bla bla" 
" pretoto win bla bla" 

我嘗試使用我的正則表達式來做到這一點,但它不工作。

你能指點我做錯了什麼嗎?

+0

引號是否會出現在輸入字符串中? – Cylian

+0

它可以是任何東西。這是一個普通的文本 –

+0

請[不要添加簽名和標語到您的帖子](http://stackoverflow.com/faq#signatures)。你也經常拼錯「很多」。 「a」和「lot」之間有一個空格。 – meagar

回答

1

這會工作

(?im)^[?.\s%a-z]*?\btoto win\b.+$ 

說明

"(?im)" +   // Match the remainder of the regex with the options: case insensitive (i);^and $ match at line breaks (m) 
"^" +    // Assert position at the beginning of a line (at beginning of the string or after a line break character) 
"[?.\\s%a-z]" + // Match a single character present in the list below 
        // One of the characters 「?.」 
        // A whitespace character (spaces, tabs, and line breaks) 
        // The character 「%」 
        // A character in the range between 「a」 and 「z」 
    "*?" +   // Between zero and unlimited times, as few times as possible, expanding as needed (lazy) 
"\\b" +   // Assert position at a word boundary 
"toto\\ win" +  // Match the characters 「toto win」 literally 
"\\b" +   // Assert position at a word boundary 
"." +    // Match any single character that is not a line break character 
    "+" +    // Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
"$"    // Assert position at the end of a line (at the end of the string or before a line break character) 

更新1

(?im)^[?~`'[email protected]#$%^&*+.\s%a-z]*? toto win\b.*$ 

UPDATE 2

(?im)^[^-]*?\btoto win\b.*$ 

UPDATE 3

(?im)^.*?(?<!-)toto win\b.*$ 

說明

"(?im)" +  // Match the remainder of the regex with the options: case insensitive (i);^and $ match at line breaks (m) 
"^" +   // Assert position at the beginning of a line (at beginning of the string or after a line break character) 
"." +   // Match any single character that is not a line break character 
    "*?" +   // Between zero and unlimited times, as few times as possible, expanding as needed (lazy) 
"(?<!" +  // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) 
    "-" +   // Match the character 「-」 literally 
")" + 
"toto\\ win" + // Match the characters 「toto win」 literally 
"\\b" +   // Assert position at a word boundary 
"." +   // Match any single character that is not a line break character 
    "*" +   // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
"$"    // Assert position at the end of a line (at the end of the string or before a line break character) 

正則表達式需要ESCA用於代碼內使用

+0

此字符串不匹配:「here!toto win dfddfd」 –

+0

其實可以有任何字符。想象一下網站上的文字。我們可以有任何東西。除了「blatoto win」或「-toto win」之外,我還沒有一些文字/字符(除了「 - 」)。 –

+0

太好了。它做我想要的。非常感謝。 –

0

你缺少win和下一個單詞之間的空格在您的模式

試試這個:\\stoto\\swin\\s\\w

http://gskinner.com/RegExr/在這裏你可以盡你的正則表達式

+0

你的意思是我必須有String pattern =「(\\ s)* toto win(\\ s)*(\\ W)*」; \t? –

+0

@CC。看到我的編輯 – dantuch

+0

@CC,對不起,現在它應該可以正常工作。 – dantuch

0

下面的正則表達式

^[a-zA-Z. ]*toto win[a-zA-Z. ]*$ 

威爾匹配

toto win bla bla 
toto win because of 
toto win. bla bla 

而且不匹配

-toto win bla bla" 
+0

這似乎很棒,但像「toto win。bla bla」這樣的字符串不起作用。有任何想法嗎 ? –

+0

更新了我的答案。在你的問題中,你提到了「特殊」字符。我補充了一點。通過將其添加到角色類別中來考慮您認爲特別的東西。你看到了嗎?根據需要添加。 – buckley

+0

我明白了。我剛剛更新了我的問題。仍然不完全工作。我不知道如何在我的模式之前沒有性格。 –

1

只要改變你的代碼String pattern = "\\s*toto win[\\w\\s]*";

\ W意味着沒有文字字符,\ w表示單詞字符(A-ZA-Z_0-9)。

[\\w\\s]*將匹配「toto win」後的任意數量的單詞和空格。

UPDATE

,以反映新的要求,這表達式將工作:

"((.*\\s)+|^)toto win[\\w\\s\\p{Punct}]*" 

((.*\\s)+|^)比賽無論是什麼,然後至少一個空號或行的開始。

[\\w\\s\\p{Punct}]*匹配單詞,數字,空格和標點符號的任意組合。

0

如果您包含實際要求,而不是要匹配的東西列表,那麼它會更容易。我有一個強烈的懷疑「toto winabc」不應該匹配,但我不確定,因爲你沒有包括這樣的例子或解釋的要求。無論如何,這適用於您當前的所有示例:

static String[] matchThese = new String[] { 
     " toto win bla bla", 
     "toto win because of", 
     "toto win. bla bla", 
     "here. toto win. bla bla", 
     "here? toto win. bla bla", 
     "here %dfddfd . toto win. bla bla" 
}; 

static String[] dontMatchThese = new String[] { 
     " -toto win bla bla", 
     " pretoto win bla bla" 
}; 


public static void main(String[] args) { 
    // either beginning of a line or whitespace followed by "toto win" 
    Pattern p = Pattern.compile("(^|\\s)toto win"); 

    System.out.println("Should match:"); 
    for (String s : matchThese) { 
     System.out.println(p.matcher(s).find()); 
    } 

    System.out.println("Shouldn't match:"); 
    for (String s : dontMatchThese) { 
     System.out.println(p.matcher(s).find()); 
    } 
} 
+0

我舉例說明了應該匹配哪種文本。文本可以是任何東西,所以我不能使用你的方法。不管怎麼說,還是要謝謝你。 –