如何從wordcloud中刪除單詞？

我使用R中的wordcloud軟件包創建了一個wordcloud，並且在「Word Cloud in R」的幫助下。如何從wordcloud中刪除單詞？

我可以很容易地做到這一點，但我想從這個wordcloud中刪除單詞。我在一個文件中有文字（實際上是一個excel文件，但我可以改變它），並且我想排除所有這些單詞，其中有幾百個單詞。有什麼建議麼？

require(XML) 
require(tm) 
require(wordcloud) 
require(RColorBrewer) 
ap.corpus=Corpus(DataframeSource(data.frame(as.character(data.merged2[,6])))) 
ap.corpus=tm_map(ap.corpus, removePunctuation) 
ap.corpus=tm_map(ap.corpus, tolower) 
ap.corpus=tm_map(ap.corpus, function(x) removeWords(x, stopwords("english"))) 
ap.tdm=TermDocumentMatrix(ap.corpus) 
ap.m=as.matrix(ap.tdm) 
ap.v=sort(rowSums(ap.m),decreasing=TRUE) 
ap.d=data.frame(word = names(ap.v),freq=ap.v) 
table(ap.d$freq)

來源

2011-12-23 user1108155

代替或補充，在'禁用詞（「英語」）'添加停止詞從Excel文件也是如此。你可以合併單詞的矢量來製作一個停用詞的矢量。這些不在雲端。 – 2011-12-23 20:25:14

@Tyler林克已經給出了答案，只需添加的removeWords()另一條線，但這裏的一些詳細信息。

比方說，您的Excel文件被稱爲nuts.xls，有字的一列這樣

stopwords 
peanut 
cashew 
walnut 
almond 
macadamia

在R你可以這樣進行

 library(gdata) # package with xls import function 
    library(tm) 
    # now load the excel file with the custom stoplist, note a few of the arguments here 
    # to clean the data by removing spaces that excel seems to insert and prevent it from 
    # importing the characters as factors. You can use any args from read.table(), which is 
    # handy 
    nuts<-read.xls("nuts.xls", header=TRUE, stringsAsFactor=FALSE, strip.white=TRUE) 

    # now make some words to build a corpus to test for a two-step stopword removal process... 
    words1<- c("peanut, cashew, walnut, macadamia, apple, pear, orange, lime, mandarin, and, or, but") 
    words2<- c("peanut, cashew, walnut, almond, apple, pear, orange, lime, mandarin, if, then, on") 
    words3<- c("peanut, walnut, almond, macadamia, apple, pear, orange, lime, mandarin, it, as, an") 
    words.all<-data.frame(rbind(words1,words2,words3)) 
    words.corpus<-Corpus(DataframeSource((words.all))) 

    # now remove the standard list of stopwords, like you've already worked out 
    words.corpus.nostopwords <- tm_map(words.corpus, removeWords, stopwords("english")) 
    # now remove the second set of stopwords, this time your custom set from the excel file, 
    # note that it has to be a reference to a character vector containing the custom stopwords 
    words.corpus.nostopwords <- tm_map(words.corpus.nostopwords, removeWords, nuts$stopwords) 

    # have a look to see if it worked 
    inspect(words.corpus.nostopwords) 
    A corpus with 3 text documents 

    The metadata consists of 2 tag-value pairs and a data frame 
    Available tags are: 
      create_date creator 
    Available variables in the data frame are: 
      MetaID 

    $words1 
     , , , , apple, pear, orange, lime, mandarin, , , 

    $words2 
     , , , , apple, pear, orange, lime, mandarin, , , 

    $words3 
     , , , , apple, pear, orange, lime, mandarin, , ,

成功！標準停用詞不見了，就像excel文件中的自定義列表中的單詞一樣。毫無疑問，還有其他方法可以做到這一點。

來源

2011-12-24 08:33:42 Ben

感謝Ben和Tin Man。兩者的某種組合爲我解決。我用gdata加載xls時遇到了麻煩，因爲如果屏蔽了它，那麼我的問題變成了excel和包含多個單詞的單元格的額外空間。儘管我欣賞這一切！謝謝！ – user1108155 2011-12-27 17:12:17

將您想要創建datacloud的數據轉換爲數據框。用您想要刪除的單詞創建一個CSV文件，並將其作爲數據框讀取。然後，您可以使一個anti_join：

來源

2017-10-16 19:12:06

如何從wordcloud中刪除單詞？

回答

相關問題