伯爵一號實例與R中沒有重複計數

我有一個關鍵字列表：伯爵一號實例與R中沒有重複計數

library(stringr) 
words <- as.character(c("decomposed", "no diagnosis","decomposition","autolysed","maggots", "poor body", "poor","not suitable", "not possible"))

我想匹配這些關鍵字的數據幀列文本（DF $文本）和計數一個關鍵字在一個不同data.frame（matchdf）發生的次數：

matchdf<- data.frame(Keywords=words) 
m_match<-sapply(1:length(words), function(x) sum(str_count(tolower(df$text),words[[x]]))) 
matchdf$matchs<-m_match

然而，我注意到，該方法計算一列內的關鍵詞的每次出現。例如）

"The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time"

然後會返回2的計數。但是，我只想計算字段中「decomposed」的第一個實例。

我認爲會有一種方法只計算使用str_count的第一個實例，但似乎沒有一個。

來源

2017-07-13 GISKid

你不想'str_detect'然後？ – CPak

在這個例子中stringr並不是必須的，grepl從base R開始就足夠了。這就是說，使用str_detect代替grepl，如果你喜歡的包的功能（如在評論中指出@智樂）

library(stringr) 

words <- c("decomposed", "no diagnosis","decomposition","autolysed","maggots", 
      "poor body", "poor","not suitable", "not possible") 

df <- data.frame(text = "The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time") 

matchdf <- data.frame(Keywords = words, stringsAsFactors = FALSE) 

# Base R grepl 
matchdf$matches1 <- sapply(1:length(words), function(x) as.numeric(grepl(words[x], tolower(df$text)))) 

# Stringr function 
matchdf$matches2 <- sapply(1:length(words), function(x) as.numeric(str_detect(tolower(df$text),words[[x]]))) 

matchdf

結果

 Keywords matches1 matches2 
1 decomposed  1  1 
2 no diagnosis  0  0 
3 decomposition  0  0 
4  autolysed  0  0 
5  maggots  0  0 
6  poor body  0  0 
7   poor  0  0 
8 not suitable  0  0 
9 not possible  0  0

來源

2017-07-13 22:19:10 Damian

伯爵一號實例與R中沒有重複計數

回答

相關問題