2017-02-15 26 views
0
library(tm) 
library(topicmodels) 
lda_topicmodel <- model_LDA(dtm, k=20, control=list(seed=1234)) 

我執行使用在R中LDA功能隱含狄利克雷分佈現在,我在S4對象格式的LDA如何將LDA輸出轉換爲R中的詞主題矩陣?

如何將其轉換爲R中的文字 - 主題矩陣和文檔 - 主題矩陣?

不幸的是,'S4'類型的對象不可子集。所以,我不得不求助於複製一部分數據以供使用。

Topic 1  Topic 2 Topic 3 Topic 4 Topic 5  Topic 6 Topic 7   Topic 8 Topic 9  Topic 10  
[1,] "flooding" "beach" "sets" "flooding" "storm"  "fwy"  "storms"  "flooding" "socal"  "rain"  
[2,] "erosion" "long" "alltime" "just"  "flooding" "due"  "thunderstorms" "via"  "major"  "california" 
[3,] "cause"  "abc7" "rain" "almost" "years"  "closures" "flash"   "public" "throughout" "nearly"  
[4,] "emergency" "day"  "slides" "hardcore" "mudslides" "avoid" "continue"  "asks"  "abc7"  "southern" 
[5,] "highway" "history" "last" "spun"  "snow"  "latest" "possible"  "call"  "streets" "storms" 



Topic 11 Topic 12 Topic 13 Topic 14  Topic 15  Topic 16 Topic 17 Topic 18 Topic 19  Topic 20  
[1,] "abc7" "abc7"  "like" "widespread" "widespread" "across" "rainfall" "flooding" "flooding" "vehicles" 
[2,] "beach" "flooding" "closed" "batters"  "biggest"  "can" "record" "region" "storm"  "several"  
[3,] "long" "stranded" "live" "california" "evacuations" "stay" "breaks" "reported" "california" "getting"  
[4,] "fwy" "county" "raining" "evacuations" "mudslides" "home" "long"  "corona" "causes"  "floodwaters" 
[5,] "710" "san"  "blog" "mudslides" "years"  "wires" "beach" "across" "related" "stranded" 

的圖像包含的各個主題中的詞語的子集:LDA word-topic 我希望向S4對象的內容寫入像一個字主題矩陣csv文件,如下所示: Word-Topic Matrix

+0

快樂回答這個問題。但是,您能否提供:最小的數據集,您獲得的結果和期望的結果是什麼? – lizzie

+0

請看這裏:http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example你應該編輯你的原始帖子與那些信息 – lizzie

+0

@lizzie。我希望將它從傳統的S4格式的LDA轉換爲矩陣並將其寫入一個csv文件。有任何想法嗎? – Sisir

回答

1

由於我們無法複製您的數據,因此我使用了一些來自R的數據。

# load the libraries 
library(topicmodels) 
library(tm) 

# load the data we'll be using 
data("AssociatedPress") 

# estimate a LDA model using the VEM algorithm (default) 
# I'll be using the number of k (number of topics) being 2 
# just as a example 
ap_lda <- LDA(AssociatedPress, 
       k = 2, 
       control = list(seed = 1234)) 

# get all the terms in a dataframe 
as.data.frame(terms(ap_lda, dim(ap_lda)[1])) 

輸出將是:

Topic 1 Topic 2 
1 percent   i 
2 million president 
3  new government 
4 year  people 
5 billion  soviet 
6 last  new 
相關問題