如何獲取R中的單詞頻率和相應的單詞

-2

我正在處理文本挖掘項目，並使用tm包在R中創建了一個稀疏矩陣。該數據是在下述格式：如何獲取R中的單詞頻率和相應的單詞

需要將數據扯皮幫助。

2016-12-07 Kshitiz Khatri

歡迎StackOverflow上。請看看如何產生[最小的，完整的和可覈查的示例]（http://stackoverflow.com/help/mcve），以及這個職位上[R中創造一個很好的例子，這些技巧（ http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example）。也許下面的提示[問一個好問題]（http://stackoverflow.com/help/how-to-ask）也值得一讀。 – lmo

一個想法使用dplyr和tidyr，

library(dplyr) 
library(tidyr) 
df %>% 
group_by(C1, C2, C3) %>% 
summarise_each(funs(sum)) %>% 
gather(word, freq, not:great) 

#Source: local data frame [24 x 5] 
#Groups: C1, C2 [4] 

#  C1  C2 C3 word freq 
# <dbl> <fctr> <dbl> <chr> <dbl> 
#1  1  a  1 not  0 
#2  1  a  2 not  1 
#3  2  b  3 not  2 
#4  2  d  2 not  0 
#5  3  c  1 not  1 
#6  3  c  2 not  0 
#7  1  a  1 cant  1 
#8  1  a  2 cant  0 
#9  2  b  3 cant  0 
#10  2  d  2 cant  0

DATA

dput(df) 
structure(list(C1 = c(1, 2, 3, 2, 3, 2, 1), C2 = structure(c(1L, 
2L, 3L, 2L, 3L, 4L, 1L), .Label = c("a", "b", "c", "d"), class = "factor"), 
    C3 = c(2, 3, 2, 3, 1, 2, 1), not = c(1, 1, 0, 1, 1, 0, 0), 
    cant = c(0, 0, 0, 0, 1, 0, 1), able = c(1, 0, 0, 0, 0, 0, 
    0), great = c(0, 0, 0, 0, 0, 1, 1)), .Names = c("C1", "C2", 
"C3", "not", "cant", "able", "great"), row.names = c(NA, -7L), class = "data.frame")

來源

2016-12-07 14:46:39 Sotos

如何獲取R中的單詞頻率和相應的單詞

回答

相關問題