2017-09-06 57 views
0

我想繪製關於主題(主題關係)的詞分佈網絡。使用此代碼 [source]鄰近主題圖

樣品
post <- topicmodels::posterior(ldaOut) 

cor_mat <- cor(t(post[["terms"]])) 
cor_mat[ cor_mat < .05 ] <- 0 
diag(cor_mat) <- 0 

graph <- graph.adjacency(cor_mat, weighted=TRUE, mode="lower") 
graph <- delete.edges(graph, E(graph)[ weight < 0.05]) 

E(graph)$edge.width <- E(graph)$weight*20 
V(graph)$label <- paste("Topic", V(graph)) 
V(graph)$size <- colSums(post[["topics"]]) * 15 

par(mar=c(0, 0, 3, 0)) 
set.seed(110) 
plot.igraph(graph, edge.width = E(graph)$edge.width, 
    edge.color = "orange", vertex.color = "orange", 
    vertex.frame.color = NA, vertex.label.color = "grey30") 
title("Strength Between Topics Based On Word Probabilities", cex.main=.8) 

cor_mat數據:

  1   2   3   4   5   6   7  ... 
1 0.00000000 0.00000000 0.00000000 0.09612831 0.00000000 0.17248020 0.00000000 
2 0.00000000 0.00000000 0.07206496 0.00000000 0.00000000 0.05755187 0.00000000 
3 0.00000000 0.07206496 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
4 0.09612831 0.00000000 0.00000000 0.00000000 0.08459681 0.00000000 0.06895900 
5 0.00000000 0.00000000 0.00000000 0.08459681 0.00000000 0.00000000 0.00000000 
6 0.17248020 0.05755187 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
7 0.00000000 0.00000000 0.00000000 0.06895900 0.00000000 0.00000000 0.00000000 
8 0.00000000 0.00000000 0.00000000 0.00000000 0.54849308 0.00000000 0.00000000 
9 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.09745720 0.00000000 
10 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
11 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
12 0.00000000 0.00000000 0.00000000 0.10329825 0.00000000 0.14057310 0.00000000 
13 0.14664201 0.00000000 0.00000000 0.00000000 0.05803984 0.00000000 0.00000000 
14 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
15 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
16 0.00000000 0.00000000 0.10290656 0.00000000 0.00000000 0.00000000 0.06293238 
17 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
18 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
19 0.00000000 0.00000000 0.00000000 0.00000000 0.33483481 0.00000000 0.00000000 
20 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
21 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 
22 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.27720724 0.00000000 
23 0.12487435 0.14806837 0.00000000 0.10355990 0.00000000 0.05086977 0.00000000 
24 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.06622769 0.00000000 

不幸的是,劇情是這樣的: enter image description here

如何使主題更加優雅的網絡的任何想法,顯示主題之間的聯繫而不是使它們彼此重疊?

+0

郵政例如數據'ldaOut'。 – PoGibas

+0

我用數據樣本更新了問題。 – Sultan

+0

'post'是什麼? – PoGibas

回答

0

簡單的解決方案是將數字w eight*20colSums(post[["topics"]])*15更改爲更小的數字,以避免重疊問題。該代碼可以是這樣

...  
E(graph)$edge.width <- E(graph)$weight* 5 
V(graph)$label <- paste("Topic", V(graph)) 
V(graph)$size <- colSums(post[["topics"]]) * 2 
... 

的,這個結果,你 enter image description here