繪製高度相關的單詞以針對特定的興趣單詞

我想繪製單詞的最高相關性。例如，我想繪製「鯨魚」這個詞的最高十個相關關係。有人能幫助我解決類似的問題嗎？如果有幫助，我安裝了RGraphViz。繪製高度相關的單詞以針對特定的興趣單詞

s.dir1<-"/PATHTOTEXT/MobyDickTxt" 

s.cor1<-Corpus(DirSource(s.dir1), readerControl=list(reader=readPlain)) 
s.cor1<-tm_map(s.cor1, removePunctuation) 
s.cor1<-tm_map(s.cor1, stripWhitespace) 
s.cor1<-tm_map(s.cor1, tolower) 
s.cor1<-tm_map(s.cor1, removeNumbers) 
s.cor1<-tm_map(s.cor1, removeWords, stopwords("english")) 
tdm1 <- TermDocumentMatrix(s.cor1) 

m1 <- as.matrix(tdm) 
v1 <- sort(rowSums(m), decreasing=TRUE) 
d1 <- data.frame(word = names(v),freq=v)

來源

2013-10-23 user2890975

什麼樣的圖？你必須比這更明確。 –

我真的沒有偏好。我正在展示一些研究，涉及查看歷史文獻中的情感詞彙之間的關聯。因此，任何能夠讓觀衆成員仔細查看關係的事情對我來說都是好事。 – user2890975

那麼我會推薦一個dotplot。請使用谷歌福與R和dotplot，並嘗試找出你自己的。 –

這裏的計算上的話，在語料庫中的給定字關聯，並繪製那些話和相關性的方法。

獲取示例數據...

require(tm) 
data("crude") 
tdm <- TermDocumentMatrix(crude)

計算的相關性並存儲在數據幀...

toi <- "oil" # term of interest 
corlimit <- 0.7 # lower correlation bound limit. 
oil_0.7 <- data.frame(corr = findAssocs(tdm, toi, corlimit)[[1]], 
        terms = names(findAssocs(tdm, toi, corlimit)[[1]]))

創建允許ggplot排序的數據幀的一個因素......

oil_0.7$terms <- factor(oil_0.7$terms ,levels = oil_0.7$terms)

繪製圖...

require(ggplot2) 
ggplot(oil_0.7, aes(y = terms )) + 
    geom_point(aes(x = corr), data = oil_0.7) + 
    xlab(paste0("Correlation with the term ", "\"", toi, "\""))

enter image description here

來源

2013-11-12 09:30:21 Ben

這個迴應啓發了qdap的'word_cor'函數的繪圖方法。我給你信貸，但作爲SO的本。如果您想要使用您的全名，請發送電子郵件給我。 –

代碼片段不起作用:( –

它的一個非常小的故障：corr和條款有不同的大小;使用row.names而不是名稱的作品，然後我只需要更改名稱爲第一個變量corr和thats'it :) –

繪製高度相關的單詞以針對特定的興趣單詞

回答

相關問題