我想繪製一條用於推文中的術語的加權圖。基本上我做了一個術語文件矩陣;刪除了稀疏詞彙;建立剩餘單詞的adjazenzmatrix,並想繪製它們。 我找不出問題在哪裏。試着做完全一樣:http://www.rdatamining.com/examples/text-miningR igraph Adjazenzmatrix加權圖 - 繪圖不加權
這裏是我的代碼:
tweet_corpus = Corpus(VectorSource(df$CONTENT))
tdm = TermDocumentMatrix(
tweet_corpus,
control = list(
removePunctuation = TRUE,
stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")),
removeNumbers = TRUE, tolower = TRUE)
)
m = as.matrix(tdm)
termDocMatrix <- m
termDocMatrix[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
aabutin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aad 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaf 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aannacm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aantukin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
myTdm2 <- removeSparseTerms(tdm, sparse =0.98)
m2 <- as.matrix(myTdm2)
m2[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
filipino 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
give 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
god 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
good 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
guy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
haiyan 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
myTdm2
<<TermDocumentMatrix (terms: 34, documents: 27395)>>
Non-/sparse entries: 39769/891661
Sparsity : 96%
Maximal term length: 9
Weighting : term frequency (tf)
termDocMatrix2 <- m2
termDocMatrix2[termDocMatrix2>=1] <- 1
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2)
termMatrix2[5:10,5:10]
Terms
Terms disaster give god good guy test
disaster 623 6 53 11 4 19
give 6 592 98 16 8 6
god 53 98 2679 135 38 29
good 11 16 135 816 21 5
guy 4 8 38 21 637 5
test 19 6 29 5 5 610
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected")
g2 <- simplify(g2)
V(g)$label <- V(g)$name
V(g2)$label <- V(g2)$name
V(g2)$degree <- degree(g2)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g2)
plot(g2, layout=layout1)
plot(g2, layout=layout.kamada.kawai)
V(g2)$label.cex <- 2.2 * V(g2)$degree/max(V(g2)$degree)+ .2
V(g2)$label.color <- rgb(0, 0, .2, .8)
V(g2)$frame.color <- NA
egam <- (log(E(g2)$weight)+.4)/max(log(E(g2)$weight)+.4)
E(g2)$color <- rgb(.5, .5, 0, egam)
E(g2)$width <- egam
plot(g2, layout=layout1)
這則看起來像:
,但我想有這樣的事情:
顯然稱重不起作用 - 但爲什麼?!
謝謝各位提前!
我不確定,但是原因是,所有單詞至少共享一個連接?但仍然應該有加權的組成部分,因爲一些單詞出現與其他人超過60倍等 – user3815852