2015-06-23 93 views
0

我想繪製一條用於推文中的術語的加權圖。基本上我做了一個術語文件矩陣;刪除了稀疏詞彙;建立剩餘單詞的adjazenzmatrix,並想繪製它們。 我找不出問題在哪裏。試着做完全一樣:http://www.rdatamining.com/examples/text-miningR igraph Adjazenzmatrix加權圖 - 繪圖不加權

這裏是我的代碼:

tweet_corpus = Corpus(VectorSource(df$CONTENT)) 
tdm = TermDocumentMatrix(
    tweet_corpus, 
    control = list(
     removePunctuation = TRUE, 
     stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")), 
     removeNumbers = TRUE, tolower = TRUE) 
     ) 

m = as.matrix(tdm) 
termDocMatrix <- m 
termDocMatrix[5:10,1:20] 
      Docs 
Terms  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
    aabutin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    aad  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    aaf  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    aali  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    aannacm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    aantukin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

myTdm2 <- removeSparseTerms(tdm, sparse =0.98) 
m2 <- as.matrix(myTdm2) 
m2[5:10,1:20] 
      Docs 
Terms  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
    filipino 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    give  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 
    god  0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
    good  0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 
    guy  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 
    haiyan 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 

myTdm2 
<<TermDocumentMatrix (terms: 34, documents: 27395)>> 
Non-/sparse entries: 39769/891661 
Sparsity   : 96% 
Maximal term length: 9 
Weighting   : term frequency (tf) 

termDocMatrix2 <- m2 
termDocMatrix2[termDocMatrix2>=1] <- 1 
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2) 
termMatrix2[5:10,5:10] 
      Terms 
Terms  disaster give god good guy test 
    disaster  623 6 53 11 4  19 
    give   6 592 98 16 8  6 
    god   53 98 2679 135 38  29 
    good   11 16 135 816 21  5 
    guy    4 8 38 21 637  5 
    test   19 6 29 5 5 610 
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected") 
g2 <- simplify(g2) 
V(g)$label <- V(g)$name 
V(g2)$label <- V(g2)$name 
V(g2)$degree <- degree(g2) 
set.seed(3952) 
layout1 <- layout.fruchterman.reingold(g2) 
plot(g2, layout=layout1) 
plot(g2, layout=layout.kamada.kawai) 
V(g2)$label.cex <- 2.2 * V(g2)$degree/max(V(g2)$degree)+ .2 
V(g2)$label.color <- rgb(0, 0, .2, .8) 
V(g2)$frame.color <- NA 
egam <- (log(E(g2)$weight)+.4)/max(log(E(g2)$weight)+.4) 
E(g2)$color <- rgb(.5, .5, 0, egam) 
E(g2)$width <- egam 
plot(g2, layout=layout1) 

這則看起來像: enter image description here

,但我想有這樣的事情: enter image description here

顯然稱重不起作用 - 但爲什麼?!

謝謝各位提前!

+0

我不確定,但是原因是,所有單詞至少共享一個連接?但仍然應該有加權的組成部分,因爲一些單詞出現與其他人超過60倍等 – user3815852

回答

0

即使您的圖形是加權的,佈局算法也不會使用權重,除非您明確地告訴它這樣做。試試這個:

layout1 <- layout.fruchterman.reingold(g2, weights=E(g2)$weight) 

但是,如果你的權重瘋狂在幅度方面不同,它通常是更好地使用權的對數(加上一些常數,使所有的人都嚴格爲正)作爲輸入佈局算法。

+0

不工作...產生相同的輸出 - 我將如何使用權重的對數? – user3815852

+0

創建一個包含權重對數的向量,然後將其傳遞給'weights'參數。此外,確保在繪製圖形時確實使用'layout1'作爲圖形佈局 - 在上面的示例代碼中,您有多次調用'plot()',因此您可能正在查看錯誤的圖形。 –