如何搜索文本數據中單個詞的出現次數？

如何找到單詞列表的出現次數？我可以搜索一個詞如下：如何搜索文本數據中單個詞的出現次數？

dplyr::filter(data, grepl("apple", data$content,ignore.case = TRUE)) 
length(x$content)

的|分離讓我來總結所有事件。但是我想單獨計算每個單詞。

詞語可以作爲在一個csv行供給或寫入中的R本身的載體，例如：

words <- c("apple","orange","pear","pineapple")

一種皺紋是，data$count是鳴叫的列，以便可出現字多比每次推文一次。所以我只想在行中出現時纔算數。

來源

2017-08-17 Manassa Mauler

見'stringr :: str_count' – www

你可以得到logical值這樣的存在/不存在你的目標的話：

library(tidyverse) 

words <- c("apple","orange","pear","pineapple") 

data <- tibble(content = c("Ony my grocery list are green apples, red apples and oranges", 
          "My favorite froyo flavors are pineapple, peach-pear and pear")) 

boundary_words <- paste0("\\b", words) # if you want to avoid counting the apple in pineapple 

map_dfc(boundary_words, ~ as.tibble(grepl(., data$content))) %>% 
    set_names(words) %>% 
    bind_cols(data, .) 

# A tibble: 2 x 5 
                 content apple orange pear pineapple 
                 <chr> <lgl> <lgl> <lgl>  <lgl> 
1 Ony my grocery list are green apples, red apples and oranges TRUE TRUE FALSE  FALSE 
2 My favorite froyo flavors are pineapple, peach-pear and pear FALSE FALSE TRUE  TRUE

來源

2017-08-17 17:34:21 Nate

大，謝謝。我添加的一個擴展是命名對象'newdata'並用'apply（X = newdata [9:12]，2，FUN = function（x）length（which（x = = 'TRUE'）））' –

使用stringr包...

library(stringr) 
words <- c("apple","orange","pear","pineapple") 

data <- c("On my grocery list are green apples, red apples and oranges", 
      "Oranges are my favourite, but I also like pineapples and pearls") 

sapply(words,function(w) 
     str_count(str_to_lower(str_split(data," ")), #split into words and set to lower case 
       paste0("\\b",w,"s*\\b"))) #adds word boundaries and optional plural -s 

    apple orange pear pineapple 
[1,]  2  1 0   0 
[2,]  0  1 0   1 

This allows for capital letters, and should only count whole words (perhaps with an -s plural).

來源

2017-08-17 19:07:59

如何搜索文本數據中單個詞的出現次數？

回答

相關問題