2012-12-24 106 views
0

我似乎無法從下面的短語的電子郵件地址:易正則表達式是混亂

「的mailto:?[email protected]

到目前爲止,我已經試過

regexpr(":([^\\?*]?)", phrase) 

代碼的邏輯如下:

  1. 開始用分號字符:
  2. 讓每一個字符不是問號
  3. 返回圓括號內的那些字符。

我不知道我的錯誤在哪裏,我的正則表達式。

回答

9

讓我們來看看你的正則表達式,我們會看到你錯誤的地方。我們將會把它拆開來使它更容易談:

:   Just a literal colon, no worries here. 
(   Open a capture group. 
    [  Open a character class, this will match one character. 
     ^ The leading^means "negate this class" 
     \\ This ends up as a single \ when the regex engine sees it and that will 
      escape the next character. 
     ? This has no special meaning inside a character class, sometimes a 
      question mark is just a question mark and this is one of those 
      times. Escaping a simple character doesn't do anything interesting. 
     * Again, we're in a character class so * has no special meaning. 
    ]  Close the character class. 
    ?  Zero or one of the preceding pattern. 
)   Close the capture group. 

剔除噪音給我們:([^?*]?)

所以,你的正則表達式匹配實際:

冒號後面未問號或星號的零個或一個字符和非問號或無星號將是第一個捕獲組。

這完全不像你想要做的。一對夫婦的調整應排在你出去:

:([^?]*) 

匹配:

冒號後面任意數量的非問題的標誌和非問題的標誌將是第一個捕獲組。

字符類外的*是特殊的字符類之外它意味着「零個或多個」時,字符類中它僅僅是一個*

我會留給別人來幫助你處理R方面的事情,我只是想讓你理解正則表達式的情況。

+0

感謝您打破一切。我意識到我的錯誤在哪裏以及我實際上應該做什麼。這真的很有幫助。 – user1103294

+0

@ user1103294:謝謝,我喜歡把自己想象成一個[釣魚教練](http://www.quotationspage.com/quote/2279.html):) –

3

下面是與gsub一個非常簡單的方法:

gsub("([a-z]+:)(.*)([?]$)", "\\2", "mailto:[email protected]?") 
## Or, if you expect things other than characters before the colon 
gsub("(.*:)(.*)([?]$)", "\\2", "mailto:[email protected]?") 
## Or, discarding the first and third groups since they aren't very useful 
gsub(".*:(.*)[?]$", "\\1", "mailto:[email protected]?") 

建設關在哪裏@TylerRinker開始,你也可以使用strsplit如下(以避免必須再gsub出問號):

strsplit("mailto:[email protected]?", ":|\\?", fixed=FALSE)[[1]][2] 

如果你有這樣的字符串的列表呢?

phrase <- c("mailto:[email protected]?", 
      "mailto:[email protected]?") 
phrase 
# [1] "mailto:[email protected]?" 
# [2] "mailto:[email protected]?" 

## Using gsub 
gsub("(.*:)(.*)([?]$)", "\\2", phrase) 
# [1] "[email protected]"  "[email protected]" 

## Using strsplit 
sapply(phrase, 
     function(x) strsplit(x, ":|\\?", fixed=FALSE)[[1]][2], 
     USE.NAMES=FALSE) 
# [1] "[email protected]"  "[email protected]" 

我更喜歡簡潔的gsub方法。

+0

謝謝阿納多,我也會回答你的答案。我喜歡在分號之前執行亂碼。 – user1103294