2017-08-17 30 views
0

如何找到單詞列表的出現次數?我可以搜索一個詞如下:如何搜索文本數據中單個詞的出現次數?

dplyr::filter(data, grepl("apple", data$content,ignore.case = TRUE)) 
length(x$content) 

|分離讓我來總結所有事件。但是我想單獨計算每個單詞。

詞語可以作爲在一個csv行供給或寫入中的R本身的載體,例如:

words <- c("apple","orange","pear","pineapple") 

一種皺紋是,data$count是鳴叫的列,以便可出現字多比每次推文一次。所以我只想在行中出現時纔算數。

+0

見'stringr :: str_count' – www

回答

1

你可以得到logical值這樣的存在/不存在你的目標的話:

library(tidyverse) 

words <- c("apple","orange","pear","pineapple") 

data <- tibble(content = c("Ony my grocery list are green apples, red apples and oranges", 
          "My favorite froyo flavors are pineapple, peach-pear and pear")) 

boundary_words <- paste0("\\b", words) # if you want to avoid counting the apple in pineapple 

map_dfc(boundary_words, ~ as.tibble(grepl(., data$content))) %>% 
    set_names(words) %>% 
    bind_cols(data, .) 

# A tibble: 2 x 5 
                 content apple orange pear pineapple 
                 <chr> <lgl> <lgl> <lgl>  <lgl> 
1 Ony my grocery list are green apples, red apples and oranges TRUE TRUE FALSE  FALSE 
2 My favorite froyo flavors are pineapple, peach-pear and pear FALSE FALSE TRUE  TRUE 
+0

大, 謝謝。我添加的一個擴展是命名對象'newdata'並用'apply(X = newdata [9:12],2,FUN = function(x)length(which(x = = 'TRUE')))' –

0

使用stringr包...

library(stringr) 
words <- c("apple","orange","pear","pineapple") 

data <- c("On my grocery list are green apples, red apples and oranges", 
      "Oranges are my favourite, but I also like pineapples and pearls") 

sapply(words,function(w) 
     str_count(str_to_lower(str_split(data," ")), #split into words and set to lower case 
       paste0("\\b",w,"s*\\b"))) #adds word boundaries and optional plural -s 

    apple orange pear pineapple 
[1,]  2  1 0   0 
[2,]  0  1 0   1 

This allows for capital letters, and should only count whole words (perhaps with an -s plural). 
相關問題