2013-10-23 98 views
1

我已經實現了代碼來計算文本中單詞的出現次數。然而,我的正則表達式不接受某種原因,我得到以下錯誤: Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 12正則表達式不被接受

我的代碼是:

import java.util.*; 

公共類CountOccurrenceOfWords {

/** 
* @param args the command line arguments 
*/ 
public static void main(String[] args) { 
    // TODO code application logic here 
    char lf = '\n'; 

String text = "It was the best of times, it was the worst of times," + 
lf + 
"it was the age of wisdom, it was the age of foolishness," + 
lf + 
"it was the epoch of belief, it was the epoch of incredulity," + 
lf + 
"it was the season of Light, it was the season of Darkness," + 
lf + 
"it was the spring of hope, it was the winter of despair," + 
lf + 
"we had everything before us, we had nothing before us," + 
lf + 
"we were all going direct to Heaven, we were all going direct" + 
lf + 
"the other way--in short, the period was so far like the present" + 
lf + 
"period, that some of its noisiest authorities insisted on its" + 
lf + 
"being received, for good or for evil, in the superlative degree" + 
lf + 
"of comparison only." + 
lf + 
"There were a king with a large jaw and a queen with a plain face," + 
lf + 
"on the throne of England; there were a king with a large jaw and" + 
lf + 
"a queen with a fair face, on the throne of France. In both" + 
lf + 
"countries it was clearer than crystal to the lords of the State" + 
lf + 
"preserves of loaves and fishes, that things in general were" + 
lf + 
"settled for ever"; 

    TreeMap<String, Integer> map = new TreeMap<String, Integer>(); 
    String[] words = text.split("[\n\t\r.,;:!?(){"); 
    for(int i = 0; i < words.length; i++){ 
     String key = words[i].toLowerCase(); 

     if(key.length() > 0) { 
      if(map.get(key) == null){ 
       map.put(key, 1); 
      } 
      else{ 
       int value = map.get(key); 
       value++; 
       map.put(key, value); 
      } 
     } 
    } 

    Set<Map.Entry<String, Integer>> entrySet = map.entrySet(); 

    //Get key and value from each entry 
    for(Map.Entry<String, Integer> entry: entrySet){ 
     System.out.println(entry.getValue() + "\t" + entry.getKey()); 
    } 
    } 
} 

而且,你能請提供關於如何按字母順序排列單詞的提示?預先感謝您

+0

嘗試並跳過括號,問號和點。 =>'\ [\ n \ t \ r \。,;:!\?\(\){' –

+0

編譯器錯誤是否足夠清楚?您尚未關閉正則表達式中的角色類。 ''[\ n \ t \ r。,;:!?(){「'應該是''[\ n \ t \ r。,;:!?(){」]'。 –

+0

非常感謝你們。我只是用''[\ n \ t \ r。,;:!?(){]''替換''[\ n \ t \ r。,;:!?(){「'。乾杯 –

回答

1

您在正則表達式末尾錯過了"]"

"[\n\t\r.,;:!?(){"不正確。

您需要更換您的正則表達式"[\n\t\r.,;:!?(){]"

+0

完美的工作。非常感謝你。接受的答案 –

+0

@ AndreUk13很高興聽到....不客氣...... – Prabhakaran

0

您需要爲正則表達式轉義特殊字符。在你的情況下,你還沒有逃過(,),[,?,.{。使用\逃脫它們。例如。 \[。你也可以考慮一個預定義的字符類爲空格\s - 這將匹配\r,\t等等。

+0

謝謝..我只是用''[\ n \ t \ r。,;:!?(){'替換'「[\ n \ t \ r。,;:!?(){」'{ ]「'。歡呼聲 –

0

你的問題是在你的正則表達式未閉合的字符類。正則表達式有一些'預定義'字符,你需要在尋找它們時逃脫。

字符類是:

With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. Source

這意味着你必須要麼逃避這些字符:

\[\n\t\r\.,;:!\?\(\){ 

或者關閉字符類,你需要

[\n\t\r\.,;:!\?\(\){] 

無論哪種方式逃避點,問號和括號。

+0

除了字符類中的方括號外,沒有必要轉義元字符。 – Joey

+0

非常感謝你們。我只是用''[\ n \ t \ r。,;:!?(){]''替換''[\ n \ t \ r。,;:!?(){「'。乾杯 –