R文本挖掘從文本中過濾字符串

我想知道是否有一個現有的R函數給出了一個文本和一個字符串列表作爲輸入，將過濾掉列表中的文本中找到的匹配字符串？R文本挖掘從文本中過濾字符串

例如，

x <- "This is a new way of doing things." 
mywords <- c("This is", "new", "not", "maybe", "things.") 
filtered_words <- Rfunc(x, mywords)

然後filtered_words將包含「這是」，「新」和「東西」。

有沒有這樣的功能？

2015-09-19 Stanley

我們可以使用str_extract_all從library(stringr)。輸出將是list，可以是unlist ed將其轉換爲vector。

library(stringr) 
unlist(str_extract_all(x, mywords)) 
#[1] "This is" "new"  "things."

2015-09-19 03:34:48 akrun

filterWords = function(x, mywords){ 
    splitwords = unlist(strsplit(x, split = " ")) 
    return(splitwords[splitwords%in%mywords]) 
}

這是方法的一種方式。然而，這不會找到像「這是」這樣的兩個子詞的單詞。但是我認爲這可能會讓你對你提出的問題有更多的瞭解。

2015-09-19 09:06:16 Veera

回答