1
我正在研究R中的文本挖掘,這裏有幾個來自我的語料庫的文檔,在刪除了標點符號,數字,URL和停用詞後。在R中完成任務
myStopwords <- setdiff(myStopwords, c("r", "big"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpus <- tm_map(myCorpus, stripWhitespace)
myCorpusCopy <- myCorpus
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] examples calling java code r
[2] simulating mapreduce r big data analysis using flights data
rbloggers
[320] r reference card data mining now cran lists many useful r
functions packages data mining applications
在那之後,我想了如下詞幹,
myCorpus <- tm_map(myCorpus, stemDocument)
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)
當我嘗試運行for
循環,它顯示NA
,如下
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] NA
[2] NA
[320] NA
任何想法,我在這裏錯了嗎?