2017-10-20 105 views
2

我有一個長矢量。每個元素都是一個字符串。每個字符串可以拆分爲由','分隔的子字符串。R如何用一個新子串替換較長字符串中的子串

我想檢查我的向量中的每個字符串是否至少包含一個'bad'字符串。如果是這樣,那麼包含該'壞'字符串的整個SUB字符串應該替換爲一個新字符串。我寫了一個帶循環的長函數。但我可以發誓必須有一個更簡單的方法來做到這一點 - 也許用stringr? 非常感謝您的建議!

# Create an example data frame: 
test <- data.frame(a = c("str1_element_1_aaa, str1_element_2", 
         "str2_element_1", 
         "str3_element_1, str3_element_2_aaa, str3_element_3"), 
        stringsAsFactors = F) 
test 
str(test) 

# Defining my long function that checks if each string in a 
# vector contains a substring with a "bad" string in it. 
# If it does, that whole substring is replaced with a new string: 
library(stringr) 
mystring_replace = function(strings_vector, badstring, newstring){ 
    with_string <- grepl(badstring, strings_vector) # what elements contain badstring? 
    mysplits <- str_split(string = test$a[with_string], pattern = ', ') # split those elements with badstring based on ', ' 
    for (i in 1:length(mysplits)) { # loop through the list of splits: 
    allstrings <- mysplits[[i]] 
    for (ii in 1:length(allstrings)) { # loop through substrings 
     if (grepl(badstring, allstrings[ii])) mysplits[[i]][ii] <- newstring 
    } 
    } 
    for (i in seq_along(mysplits)) { # merge the split elements back together 
    mysplits[[i]] <- paste(mysplits[[i]], collapse = ", ") 
    } 
    strings_vector[with_string] <- unlist(mysplits) 
    return(strings_vector) 
} 
# Test 
mystring_replace(test$a, badstring = '_aaa', newstring = "NEW") 
+0

而不是使用3 for循環,你可以分裂一個壞的字符串,並加入一個好的字符串。 – numbtongue

+0

好主意,但這不會幫助我。我不想加入一個很好的字符串。我想用新的子字符串替換包含壞字符串的WHOLE子字符串。 – user3245256

回答

1

認爲這可能嗎?

new_str_replace <- function(strings_vector, badstring, newstring){ 
    split.dat <- strsplit(strings_vector,', ')[[1]] 
    split.dat[grepl(badstring, split.dat)] <- newstring 
    return(paste(split.dat, collapse = ', ')) 
} 

results <- unname(sapply(test$a, new_str_replace, badstring = '_aaa', newstring = 'NEW')) 
results 
#[1] "NEW, str1_element_2"     "str2_element_1"      
#[3] "str3_element_1, NEW, str3_element_3" 
1

我用分而治之的方式做到了。首先,我編寫了一個函數,僅對一個字符串進行操作,然後對其進行矢量化。

# does the operation for a string only. divide-and-conquer 
replace_one = function(string, badstring, newstring) { 
    # split it at ", " 
    strs = str_split(string, ", ")[[1]] 
    # an ifelse to find the ones containing badstring and replacing them 
    strs = ifelse(grepl(badstring, strs, fixed = TRUE), newstring, strs) 
    # join them again 
    paste0(strs, collapse = ", ") 
} 

# vectorizes it 
my_replace = Vectorize(replace_one, "string", USE.NAMES = FALSE) 
1

下面是一個使用了tidyversepurrrstringr的方法:

library(tidyverse) 
library(stringr) 

# Small utility function 
find_and_replace <- function(string, bad_string, replacement_string) { 
    ifelse(str_detect(string, bad_string), replacement_string, string) 
} 

str_split(test$a, ", ") %>%     
    map(find_and_replace, "aaa", "NEW") %>% 
    map_chr(paste, collapse = ", ") %>% 
    unlist 

基本上是:在該列表分割載體導入列表,地圖find_and_replace和崩潰的結果。我建議在每個管道%>%之後單獨查看結果。

+0

我喜歡它!美麗!謝謝! – user3245256

+0

奇怪的,我把它放在一個功能,但它不能正常工作: – user3245256

+0

#小效用函數 find_and_replace < - 函數(字符串,bad_string,replacement_string){ ifelse(str_detect(字符串,bad_string),replacement_string,字符串) } #函數: string_replace_n < - 函數(MyString中,mybad_string,myreplacement){ 出< - str_split(MyString的, 「」)%>% 地圖(find_and_replace,mybad_string,myreplacement)%>% map_chr(糊,崩=「,」)%>%unlist out } – user3245256

相關問題