2017-06-29 69 views
0

我試圖找到一種有效的方法,即使用刪除列表中的單詞刪除輸入列表中的一組單詞的所有實例。將一個向量中的單詞的所有實例替換爲第二個向量中指定的單詞

vectorOfWordsToRemove <- c('cat', 'monkey', 'wolf', 'mouses') 
vectorOfPhrases <- c('the cat and the monkey walked around the block', 'the wolf and the mouses ate lunch with the monkey', 'this should remain unmodified') 
remove_strings <- function(a, b) { stringr::str_replace_all(a,b, '')} 
remove_strings(vectorOfPhrases, vectorOfWordsToRemove) 

我想爲輸出

vectorOfPhrases <- c('the and the walked around the block', 'the and the ate lunch with the', 'this should remain unmodified') 

也就是說,在矢量的所有單詞的每個實例 - vectorOfWordsToRemove應vectorOfPhrases被淘汰。

我可以用for循環做到這一點,但它很慢,它似乎應該有一個矢量化的方式來有效地做到這一點。

感謝

回答

1

首先是讓空字符串的載體,以取代:

vectorOfNothing <- rep('', 4) 

然後使用qdap庫替代的載體,以取代模式的載體:

library(qdap) 
vectorOfPhrases <- qdap::mgsub(vectorOfWordsToRemove, 
           vectorOfNothing, 
           vectorOfPhrases) 

> vectorOfPhrases 
[1] "the and the walked around the block" "the and the ate lunch with the"  

[3] "this should remain unmodified" 
1

您可以使用gsubfn()

library(gsubfn) 
replaceStrings <- as.list(rep("", 4)) 
newPhrases <- gsubfn("\\S+", setNames(replaceStrings, vectorOfWordsToRemove), vectorOfPhrases) 

> newPhrases 
[1] "the and the walked around the block" "the and the ate lunch with the"  
[3] "this should remain unmodified" 
相關問題