我想使用.txt文件獲取String [],並且需要刪除具有一些例外的所有標點符號。這裏是我的代碼:replaceAll()除去這些例外的所有標點符號
replaceAll("[^a-zA-Z ]", "");
例外: 1.hyphen(s)表示,在文字內。 2.刪除包含數字的詞3.刪除包含兩個標點符號的詞在開頭和結尾
我想使用.txt文件獲取String [],並且需要刪除具有一些例外的所有標點符號。這裏是我的代碼:replaceAll()除去這些例外的所有標點符號
replaceAll("[^a-zA-Z ]", "");
例外: 1.hyphen(s)表示,在文字內。 2.刪除包含數字的詞3.刪除包含兩個標點符號的詞在開頭和結尾
我有非常複雜的正則表達式,但它的工作原理。
\S*\d+\S*|\p{Punct}{2,}\S*|\S*\p{Punct}{2,}|[\p{Punct}&&[^-]]+|(?<![a-z])\-(?![a-z])
說明:
Match this alternative «\S*\d+\S*»
Match a single character that is NOT a 「whitespace character」 «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a 「digit」 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is NOT a 「whitespace character」 «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\p{Punct}{2,}\S*»
Match a character from the POSIX character class 「punct」 «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Match a single character that is NOT a 「whitespace character」 «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\S*\p{Punct}{2,}»
Match a single character that is NOT a 「whitespace character」 «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a character from the POSIX character class 「punct」 «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Or match this alternative «[\p{Punct}&&[^-]]+»
Match a single character present in the list below «[\p{Punct}&&[^-]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character from the POSIX character class 「punct」 «\p{Punct}»
Except the literal character 「-」 «&&[^-]»
Or match this alternative «(?<![a-z])\-(?![a-z])»
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<![a-z])»
Match a single character in the range between 「a」 and 「z」 «[a-z]»
Match the character 「-」 literally «\-»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z])»
Match a single character in the range between 「a」 and 「z」 «[a-z]»
實施例:以上
String text ="a-b ab--- - ---a --- , ++++ ?%# $22 43 4zzv";
String rx = "(?i)\\S*\\d+\\S*|\\p{Punct}{2,}\\S*|\\S*\\p{Punct}{2,}|[\\p{Punct}&&[^-]]+|(?<![a-z])\\-(?![a-z])";
String result = text.replaceAll(rx, " ").trim();
System.out.println(result);
代碼將打印:
a-b
我嘗試使用我噸,我的工作部分,但我也將擺脫一個字裏面的連字符。 – zzz