0
我一直在努力遵循document classification tutorial on YouTube使用R,它真的很有趣,但是當我試圖運行腳本的第一部分時,我不斷收到此錯誤Error in FUN(c("obama", "romney")[[1L]], ...) : could not find function "corpus"
。我真的不知道這是爲什麼,但我希望有人能幫我弄明白。文件分類與r
這是腳本:
#init
libs <- c("tm", "plyr", "class")
lapply(libs, require, character.only = TRUE)
# set options
options(stringAsFactors = FALSE)
#set parameters
candidates <- c("obama","romney")
pathname <- "C:\\Users\\admin\\Documents\\speeches"
#clean text
cleanCorpus <- function(corpus){
corpus.tmp <- tm_map(corpus, removePunctuation)
corpus.tmp <- tm_map(corpus.tmp, stripWhitespace)
corpus.tmp <- tm_map(corpus.tmp, tolower)
corpus.tmp <- tm_map(corpus, removeWords, stopWords("english"))
return(corpus.tmp)
}
#Build TDM
generateTDM <- function(cand, path){
s.dir <- sprintf("%s/%s", path, cand)
s.cor <- corpus(DirSource(directory = s.dir, encoding = "ANSI"))
s.cor.cl <- cleanCorpus(s.cor)
s.tdm <-TermDocumentMatrix(s.cor.cl)
s.tdm <- removeSparseTerms(s.tdm, 0.7)
result <- list(name = cand, tdm = s.tdm)
}
tdm <- lapply(candidates, generateTDM, path = pathname)
那麼,在[8:50](http://www.youtube.com/watch?v=j1V2McKbkLo#t=530)它有'語料庫',即大寫'C'。 – jbaums
('C'和'c'與默認的RStudio字體很難區分。) – jbaums
使用Courier .... –