LDA與tm包在R使用bigrams

我有一個csv與每一行作爲文件。我需要對此執行LDA。我有以下代碼：LDA與tm包在R使用bigrams

library(tm) 
library(SnowballC) 
library(topicmodels) 
library(RWeka) 

X = read.csv('doc.csv',sep=",",quote="\"",stringsAsFactors=FALSE) 

corpus <- Corpus(VectorSource(X)) 
corpus <- tm_map(tm_map(tm_map(corpus, stripWhitespace), tolower), stemDocument) 
corpus <- tm_map(corpus, PlainTextDocument) 
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2)) 
dtm <- DocumentTermMatrix(corpus, control = list(tokenize=BigramTokenizer,weighting=weightTfIdf))

此時檢查DTM對象給出

<<DocumentTermMatrix (documents: 52, terms: 477)>> 
Non-/sparse entries: 492/24312 
Sparsity   : 98% 
Maximal term length: 20 
Weighting   : term frequency - inverse document frequency (normalized) (tf-idf)

現在我繼續在這個

rowTotals <- apply(dtm , 1, sum) 
dtm.new <- dtm[rowTotals> 0, ] 
g = LDA(dtm.new,10,method = 'VEM',control=NULL,model=NULL)

我碰到下面的錯誤進行LDA

Error in LDA(dtm.new, 10, method = "VEM", control = NULL, model = NULL) : 
    The DocumentTermMatrix needs to have a term frequency weighting

文檔術語矩陣顯然是加權的。我究竟做錯了什麼？

請幫忙。

來源

2015-06-11 dulla

是，dtm.new仍然是DocumentTermMatrix對象。 – dulla

的文檔詞矩陣需要有一個術語頻率加權：

DocumentTermMatrix(corpus, 
        control = list(tokenize = BigramTokenizer, 
          weighting = weightTf))

來源

2015-06-11 08:57:20 peterd

LDA與tm包在R使用bigrams

回答

相關問題