Kmeans聚類和文本挖掘在R

我有一個包含twitter數據（只有1個文件）的txt文件。我使用streamR和parseTweets函數將它加載到R中。我需要對這些數據進行Kmeans聚類。首先，我需要清理和準備數據，但數據是數字的混合體，它不允許我這樣做:(例如）內容轉換。
如何擺脫這些數據中所有不需要的字符？我只需要純文本。無數字，特殊字符等Kmeans聚類和文本挖掘在R

*library(streamR) 
install.packages("RCurl") 
install.packages("bitops") 
install.packages("rjson") 
library(bitops) 
library(RCurl) 
library(rjson) 
library(NLP) 
library(tm) 
library(SnowballC) 
library(XML) 
tweets.df<-parseTweets('tweetsStream.txt', simplify = FALSE); 
tweets.df<-tm_map(tweets.df,content_transformer(tolower)); 
Error in UseMethod("tm_map", x) : 
    no applicable method for 'tm_map' applied to an object of class "data.frame"*

來源

2016-11-26 Nithin Nampoothiry

tm_map函數將語料作爲輸入數據類型取回。試試這個：

docs <- Corpus(DirSource(cname)) 
docs <- tm_map(docs, tolower)

這裏找到完整的例子：https://rstudio-pubs-static.s3.amazonaws.com/31867_8236987cf0a8444e962ccd2aec46d9c3.html

進一步的細節，你總是可以運行

??tm_map或??tm去你的[R控制檯整個文檔。

Regards， Markus

來源

2016-11-26 10:51:18 molig

Kmeans聚類和文本挖掘在R

回答

相關問題