2013-02-15 131 views
3

我有數據,一個字符向量(最終我會摺疊它,所以我不在乎它是否保持向量或者它被視爲單個字符串),一個模式向量和一個替換向量。我希望數據中的每個模式都被其各自的替換替換。我用stringr和for循環完成了它,但是有沒有更像R的方法來做到這一點?避免在字符串替換循環?

require(stringr) 
start_string <- sample(letters[1:10], 10) 
my_pattern <- c("a", "b", "c", "z") 
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]") 
str_replace(start_string, pattern = my_pattern, replacement = my_replacement) 
# bad lengths, doesn't work 

str_replace(paste0(start_string, collapse = ""), 
    pattern = my_pattern, replacement = my_replacement) 
# vector output, not what I want in this case 

my_result <- start_string 
for (i in 1:length(my_pattern)) { 
    my_result <- str_replace(my_result, 
     pattern = my_pattern[i], replacement = my_replacement[i]) 
} 
> my_result 
[1] "[this was a c]" "[this was an a]" "e"    "g"    "h"    "[this was a b]" 
[7] "d"    "j"    "f"    "i" 

# This is what I want, but is there a better way? 

就我而言,我知道每個模式最多隻會發生一次,但並不是每個模式都會發生。我知道如果模式可能出現多次,我可以使用str_replace_all;我希望解決方案也能提供這種選擇。我還想要一個使用my_patternmy_replacement的解決方案,以便它可以作爲以這些向量爲參數的函數的一部分。

+1

for循環出了什麼問題?它們非常適合這種情況,您可以反覆修改矢量。 – hadley 2013-02-16 14:38:56

回答

3

我敢打賭,有另一種方式來做到這一點,但我首先想到的是gsubfn

my_repl <- function(x){ 
    switch(x,a = "[this was an a]", 
      b = "[this was a b]", 
      c = "[this was a c]", 
      z = "[this was a z]") 
} 

library(gsubfn)  
start_string <- sample(letters[1:10], 10) 
gsubfn("a|b|c|z",my_repl,x = start_string) 

如果你搜索的列表元素一個可接受的有效名稱的模式,這也將工作:

names(my_replacement) <- my_pattern 
gsubfn("a|b|c|z",as.list(my_replacement),start_string) 

編輯

但坦率地說,如果我真的公頃d在我自己的代碼中做了很多工作,我可能只是做一個函數包裝的for循環。下面是使用subgsub,而不是功能的簡單版本,從stringr

vsub <- function(pattern,replacement,x,all = TRUE,...){ 
    FUN <- if (all) gsub else sub 
    for (i in seq_len(min(length(pattern),length(replacement)))){ 
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...) 
    } 
    x 
} 

vsub(my_pattern,my_replacement,start_string) 

但當然,那有沒有這個是衆所周知內置功能的原因之一可能是像這樣連續更換不能是非常脆弱的,因爲他們是如此依賴順序:

vsub(rev(my_pattern),rev(my_replacement),start_string) 
[1] "i"           "[this w[this was an a]s [this was an a] c]" 
[3] "[this was an a]"       "g"           
[5] "j"           "d"           
[7] "f"           "[this w[this was an a]s [this was an a] b]" 
[9] "h"           "e"  
+0

謝謝,這絕對避免了循環(所以滿足我提到的所有標準),但在實際情況下,我有足夠的模式和替換(沒有什麼巨大的,只有15左右),我寧願不把它們全部寫入switch語句。 – Gregor 2013-02-15 22:53:24

+0

@shujaa還有其他選項,但前提是搜索字符串可以作爲列表項名稱(請參閱我的編輯)。 – joran 2013-02-15 22:58:30

1

下面是基於gregrexprregmatchesregmatches<-一個選項。請注意,可以匹配的正則表達式的長度是有限制的,所以如果您嘗試將太長的模式與它匹配,這將不起作用。

replaceSubstrings <- function(patterns, replacements, X) { 
    pat <- paste(patterns, collapse="|") 
    m <- gregexpr(pat, X) 
    regmatches(X, m) <- 
     lapply(regmatches(X,m), 
       function(XX) replacements[match(XX, patterns)]) 
    X 
} 

## Try it out 
patterns <- c("cat", "dog") 
replacements <- c("tiger", "coyote") 
sentences <- c("A cat", "Two dogs", "Raining cats and dogs") 
replaceSubstrings(patterns, replacements, sentences) 
## [1] "A tiger"     "Two coyotes"    
## [3] "Raining tigers and coyotes"