如何將tm_map（）輸出保存到csv文件？

我正在分析mashable.com的新文章。我所創建的數據看起來像（有現在14篇，因素是人氣還是not_popular）如何將tm_map（）輸出保存到csv文件？

ID含量因素

1人氣

一些文字資料我想要做的監督主題建模這個數據使用Jonathan Chang的LDA包。我試圖做的一些數據預處理和這裏是一樣

require("ggplot2") 
require("grid") 
require("plyr") 
library(reshape) 
library(ScottKnott) 
setwd("~/Desktop") 
library(lda) 
library(tm) 
dataValues<- read.csv('Business.csv') 

dim(dataValues) 
## Text Pre-processing. 
## Creating a Corpus from the Orginal Function 
## interprets each element of the vector x as a document 
CorpusObj<- VectorSource(dataValues$content); 
CorpusObj<-Corpus(CorpusObj); 
# remove \r and \n 
remove.carrigae <- function(x) gsub("[\r\n]", "", x) 
CorpusObj = tm_map(CorpusObj,remove.carrigae) 
#remove Hyperlinks 
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x) 
CorpusObj <- tm_map(CorpusObj, removeURL) 
#remove special char 
removeSPE <- function(x) gsub("[^a-zA-Z0-9]", " ", x) 
CorpusObj <- tm_map(CorpusObj, removeSPE) 
CorpusObj <- tm_map(CorpusObj, removePunctuation) 
CorpusObj <- tm_map(CorpusObj, removeNumbers) 
#CorpusObj <- tm_map(CorpusObj, removeWords, stopwords("english")) 
CorpusObj <- tm_map(CorpusObj, stemDocument, language = "english") #Stemming the words 
CorpusObj<-tm_map(CorpusObj,stripWhitespace) 
#CorpusObj <- tm_map(CorpusObj, tolower) # convert all text to lower case 
inspect(CorpusObj[14]) 

CorpusObj <- tm_map(CorpusObj, PlainTextDocument) 
#save in indiv text file 
writeCorpus(CorpusObj, path = "~/Desktop/untitled_folder") 
#write 1 file 
writeLines(as.character(CorpusObj), con="mycorpus.txt") 
inspect(CorpusObj[14])

我想的

CorpusObj <- tm_map(CorpusObj, PlainTextDocument)

輸出保存爲.csv文件，並希望各行（單元劇本）爲1個文檔函數writeCorpus(CorpusObj, path = "~/Desktop/untitled_folder") 只是將最後一個文檔寫入文本文件。

此外，當我嘗試使用功能PlaintextDocument後corpusLDA <- lexicalize(CorpusObj) 我得到以下輸出It has all the docs in the [1:2,1:6007] and the other 2 list are empty

請指導我一下，我該怎麼錯在何處。謝謝。

來源

2016-08-02 karan kothari

對於這個是可再生的，請提供「Businesses.csv」或使用舉個例子內置數據 –

https：//開頭的車程。 google.com/open?id=0BzodQ9yTFHfMVEhEeU1hZkd2ZEU在文件夾中有r腳本，business.csv和行writeCorpus（CorpusObj，path =「〜/ Desktop/untitled_folder」）的輸出。謝謝@ Hack-R –

當我檢查此腳本創建的.txt文件時，我看到所有不同的文檔。然而，它們卻是人類不友好的格式。

這裏就是我想你想：

pacman::p_load("ggplot2", grid, plyr, reshape, ScottKnott, lda,tm) 

dataValues <- read.csv("business.csv") 
dim(dataValues) 
## Text Pre-processing. 
## Creating a Corpus from the Orginal Function 
## interprets each element of the vector x as a document 
CorpusObj<- VectorSource(dataValues$content); 
CorpusObj<-Corpus(CorpusObj); 
# remove \r and \n 
remove.carrigae <- function(x) gsub("[\r\n]", "", x) 
CorpusObj = tm_map(CorpusObj,remove.carrigae) 
#remove Hyperlinks 
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x) 
CorpusObj <- tm_map(CorpusObj, removeURL) 
#remove special char 
removeSPE <- function(x) gsub("[^a-zA-Z0-9]", " ", x) 
CorpusObj <- tm_map(CorpusObj, removeSPE) 
CorpusObj <- tm_map(CorpusObj, removePunctuation) 
CorpusObj <- tm_map(CorpusObj, removeNumbers) 
#CorpusObj <- tm_map(CorpusObj, removeWords, stopwords("english")) 
CorpusObj <- tm_map(CorpusObj, stemDocument, language = "english") #Stemming the words 
CorpusObj<-tm_map(CorpusObj,stripWhitespace) 
#CorpusObj <- tm_map(CorpusObj, tolower) # convert all text to lower case 
inspect(CorpusObj[14]) 

CorpusObj <- tm_map(CorpusObj, PlainTextDocument) 
#save in indiv text file 
writeCorpus(CorpusObj) 
#write 1 file 
tmp <- CorpusObj[1] 

dataframe<-data.frame(text=unlist(sapply(CorpusObj, `[`, "content")), stringsAsFactors=F) 
write.csv(dataframe, "output.csv")

來源

2016-08-02 01:48:59

這就是我想要得到..這個文件的更好的版本..我嘗試了不同的功能將預處理的數據寫入一個csv文件..我的結局目標是做business.csv文件的預處理，並得到一個新的乾淨的.csv因此，我試圖以某種方式保存乾淨的數據..後來我想用這個乾淨的csv在R中監督潛在Dirchlet分配...正如我之前所說的使用tm_map輸出到函數lexicalize（）給了我圖像中的東西（在原始評論中給出）..'corpusLDA < - lexicalize（CorpusObj）'我想要每個內容部分cell csv –

@karankothari所以，你只是希望文本文件易於閱讀，對吧？ –

是的。經過預處理，以便我可以在乾淨的數據上做SLDA –

如何將tm_map（）輸出保存到csv文件？

回答

相關問題