2016-01-12 132 views
0

我有一個基本上我想清理的電子郵件列表。我想說明的是,如果'@'字符不在特定的電子郵件中,我想刪除該電子郵件 - 這樣一個輸入如'mywebsite.com'將被刪除。R部分字符串匹配 - 排除

我的代碼如下:

email_clean <- function(email, invalid = NA){ 
    email <- trimws(email)               # Removes whitespace 
    email[(nchar(email) %in% c(1,2)) ] <- invalid         # Removes emails with 1 or 2 character length 
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",   # List of bad emails - modify to the 
        "\\@noemail.com", "\\@test.com",         # specifications of the request 

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")   # Deletes names matching bad email 
    email <-gsub(pattern, invalid, sapply(email,as.character)) 
    unname(email) 
    } 

    ## Define vector of SSN from origianl csv column 
    Cleaned_Email <- email_clean(my_data$Email) 


    ## Binds cleaned phone to csv 
    my_data<-cbind(my_data,Cleaned_Email) 

謝謝!

+2

什麼是你的問題? – nrussell

回答

3
email_clean <- function(email, invalid = NA){ 
    email <- trimws(email)               # Removes whitespace 
    email[(nchar(email) %in% c(1,2)) ] <- invalid         # Removes emails with 1 or 2 character length 
    email[!grepl("@", email)] <- invalid # <------------------ New line added here ------------ 
    bad_email <- c("\\@no.com", "\\@na.com","\\@none.com","\\@email.com",   # List of bad emails - modify to the 
        "\\@noemail.com", "\\@test.com",         # specifications of the request 

    pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")   # Deletes names matching bad email 
    email <-gsub(pattern, invalid, sapply(email,as.character)) 
    unname(email) 
    } 
+0

真是太好了,謝謝皮埃爾! – Maddie

0

電子郵件列中嘗試使用此方法排除my_data沒有任何行「@」符號:

my_data <- my_data[grep('@', my_data$Email), ] 
+1

我不認爲grep的作品,因爲我在技術上尋找電子郵件的矢量,除非我失去了一些東西。 – Maddie

+0

您仍然可以使用grep:Email [grep('@',Email)]。 grep方法只是返回發生匹配的索引向量。您可以基於返回的矢量對數據框或矢量進行子集分類。 – Gopala