3
我想使用R來進行文本分類。我用DocumentTermMatrix返回字的矩陣:在R中使用DocumentTermMatrix和'dictionary'參數
library(tm)
crude <- "japan korea usa uk albania azerbaijan"
corps <- Corpus(VectorSource(crude))
dtm <- DocumentTermMatrix(corps)
inspect(dtm)
words <- c("australia", "korea", "uganda", "japan", "argentina", "turkey")
test <- DocumentTermMatrix(corps, control=list(dictionary = words))
inspect(test)
預期與結果的第一inspect(dtm)
工作:
Terms
Docs albania azerbaijan japan korea usa
1 1 1 1 1 1
但第二inspect(test)
顯示此結果:
Terms
Docs argentina australia japan korea turkey uganda
1 0 1 0 1 0 0
雖然預期結果是:
Terms
Docs argentina australia japan korea turkey uganda
1 0 0 1 1 0 0
這是一個錯誤還是我用它錯誤的方式?