2016-04-01 88 views
0

我想使用.txt文件獲取String [],並且需要刪除具有一些例外的所有標點符號。這裏是我的代碼:replaceAll()除去這些例外的所有標點符號

replaceAll("[^a-zA-Z ]", ""); 

例外: 1.hyphen(s)表示,在文字內。 2.刪除包含數字的詞3.刪除包含兩個標點符號的詞在開頭和結尾

+0

我嘗試使用我噸,我的工作部分,但我也將擺脫一個字裏面的連字符。 – zzz

回答

0

[^ a-zA-Z]是一個字符類。這意味着它只匹配一個字符,在這種情況下,它將匹配任何不是a-z,A-Z或空格的東西。

如果你想匹配單詞,你需要使用帶量詞的字符類,例如+。如果你想匹配不同的模式,你需要應用邏輯運算符|

知道了這一點,您現在可以匹配以一個或多個數字結尾的單詞或中間有一個數字的單詞[^a-zA-Z ][0-9]+|[^a-zA-Z ]+[0-9]。我會把它留給你作爲一個練習來應用它,因爲這聽起來像是一個學校任務。

+0

不工作的朋友。結果顯示相同 – zzz

+0

好吧,很酷!謝謝你,小夥伴! – zzz

0

我有非常複雜的正則表達式,但它的工作原理。

\S*\d+\S*|\p{Punct}{2,}\S*|\S*\p{Punct}{2,}|[\p{Punct}&&[^-]]+|(?<![a-z])\-(?![a-z]) 

說明:

Match this alternative «\S*\d+\S*» 
    Match a single character that is NOT a 「whitespace character」 «\S*» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match a single character that is a 「digit」 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match a single character that is NOT a 「whitespace character」 «\S*» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Or match this alternative «\p{Punct}{2,}\S*» 
    Match a character from the POSIX character class 「punct」 «\p{Punct}{2,}» 
     Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}» 
    Match a single character that is NOT a 「whitespace character」 «\S*» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Or match this alternative «\S*\p{Punct}{2,}» 
    Match a single character that is NOT a 「whitespace character」 «\S*» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match a character from the POSIX character class 「punct」 «\p{Punct}{2,}» 
     Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}» 
Or match this alternative «[\p{Punct}&&[^-]]+» 
    Match a single character present in the list below «[\p{Punct}&&[^-]]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
     A character from the POSIX character class 「punct」 «\p{Punct}» 
     Except the literal character 「-」 «&&[^-]» 
Or match this alternative «(?<![a-z])\-(?![a-z])» 
    Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<![a-z])» 
     Match a single character in the range between 「a」 and 「z」 «[a-z]» 
    Match the character 「-」 literally «\-» 
    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z])» 
     Match a single character in the range between 「a」 and 「z」 «[a-z]» 

實施例:以上

String text ="a-b ab--- - ---a --- , ++++ ?%# $22 43 4zzv"; 

String rx = "(?i)\\S*\\d+\\S*|\\p{Punct}{2,}\\S*|\\S*\\p{Punct}{2,}|[\\p{Punct}&&[^-]]+|(?<![a-z])\\-(?![a-z])"; 

String result = text.replaceAll(rx, " ").trim(); 

System.out.println(result); 

代碼將打印:

a-b 
+0

它仍然是「 - 」和「 - 」+字 – zzz

+0

反正,謝謝你! – zzz

+0

每個角色的功能是什麼?即時通訊仍然卡在那裏 – zzz

相關問題