2012-11-14 21 views
7

以外的所有標點符號我有以下正則表達式可以在任何空間或標點符號上分割。我如何從:punct:中排除一個或多個標點符號?假設我想排除撇號和逗號。我知道我可以明確地使用[all punctuation marks in here]而不是[[:punct:]],但我希望有排除方法。正則表達式;消除除

X <- "I'm not that good at regex yet, but am getting better!" 
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE) 

[1] "I"  "'"  "m"  "not"  "that" "good" "at"  "regex" "yet"  
[10] ","  ""  "but"  "am"  "getting" "better" "!" 

回答

8

這不是我清楚你想要的結果是什麼,但你可能能夠使用負類like this answer

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]] 
[1] "I'm"  "not"  "that" "good" "at"  "regex" "yet," 
[8] "but"  "am"  "getting" "better" "!"  
+1

我的頭好痛... –

0

您可以直接與(?![',])negative lookahead失敗的比賽,如果下一個字符右邊是',施加限制的PCRE子模式:

[[:space:]]|(?=(?![',])[[:punct:]]) 
       ^^^^^^^^ 

regex demo

詳細

  • [[:space:]] - 任意空白
  • | - 或
  • (?=(?![',])[[:punct:]]) - 一個正向前查找需要的是,應立即到當前位置的右側,沒有',並且有任何1個標點符號字符不是',(實際上,要求除以外的任何標點符號和,)。

R online demo

X <- "I'm not that good at regex yet, but am getting better!" 
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE) 
[[1]] 
[1] "I'm"  "not"  "that" "good" "at"  "regex" "yet," 
[8] "but"  "am"  "getting" "better" "!"