0
我試圖在R中運行樸素貝葉斯用於從文本數據進行預測(通過構建文檔術語矩陣)。樸素貝葉斯的問題
我讀了幾篇關於訓練和測試集中可能缺失的術語的警告,因此我決定只使用一個數據框並在之後進行拆分。我正在使用的代碼是這樣的:
data <- read.csv(file="path",header=TRUE)
########## NAIVE BAYES
library(e1071)
library(SparseM)
library(tm)
# CREATE DATA FRAME AND TRAINING AND
# TEST INCLUDING 'Text' AND 'InfoType' (columns 8 and 27)
traindata <- as.data.frame(data[13000:13999,c(8,27)])
testdata <- as.data.frame(data[14000:14999,c(8,27)])
complete <- as.data.frame(data[13000:14999,c(8,27)])
# SEPARATE TEXT VECTOR TO CREATE Source(),
# Corpus() CONSTRUCTOR FOR DOCUMENT TERM
# MATRIX TAKES Source()
completevector <- as.vector(complete$Text)
# CREATE SOURCE FOR VECTORS
completesource <- VectorSource(completevector)
# CREATE CORPUS FOR DATA
completecorpus <- Corpus(completesource)
# STEM WORDS, REMOVE STOPWORDS, TRIM WHITESPACE
completecorpus <- tm_map(completecorpus,tolower)
completecorpus <- tm_map(completecorpus,PlainTextDocument)
completecorpus <- tm_map(completecorpus, stemDocument)
completecorpus <- tm_map(completecorpus, removeWords,stopwords("english"))
completecorpus <- tm_map(completecorpus,removePunctuation)
completecorpus <- tm_map(completecorpus,removeNumbers)
completecorpus <- tm_map(completecorpus,stripWhitespace)
# CREATE DOCUMENT TERM MATRIX
completematrix<-DocumentTermMatrix(completecorpus)
trainmatrix <- completematrix[1:1000,]
testmatrix <- completematrix[1001:2000,]
# TRAIN NAIVE BAYES MODEL USING trainmatrix DATA AND traindata$InfoType CLASS VECTOR
model <- naiveBayes(as.matrix(trainmatrix),as.factor(traindata$InfoType),laplace=1)
# PREDICTION
results <- predict(model,as.matrix(testmatrix))
conf.matrix<-table(results, testdata$InfoType,dnn=list('predicted','actual'))
conf.matrix
的問題是,我得到奇怪的結果是這樣的:
actual
predicted 1 2 3
1 60 833 107
2 0 0 0
3 0 0 0
的爲什麼會這樣任何想法?
的原始數據是這樣的:
head(complete)
Text
13000 Milkshakes, milkshakes, whats not to love? Really like the durability and weight of the cup. Something about it sure makes good milkshakes.Works beautifully with the Cuisinart smart stick.
13001 excellent. shipped on time, is excellent for protein shakes with a cuisine art mixer. easy to clean and the mixer fits in perfectly
13002 Great cup. Simple and stainless steel great size cup for use with my cuisinart mixer. I can do milkshakes really easy and fast. Recommended. No problems with the shipping.
13003 Wife Loves This. Stainless steel....attractive and the best part is---it won't break. We are considering purchasing another one because they are really nice.
13004 Great! Stainless steel cup is great for smoothies, milkshakes and even chopping small amounts of vegetables for salads!Wish it had a top but still love it!
13005 Great with my. Stick mixer...the plastic mixing container cracked and became unusable as a result....the only downside is you can't see if the stuff you are mixing is mixed well
InfoType
13000 2
13001 2
13002 2
13003 3
13004 2
13005 2
硬盤沒有數據調試。您正在拆分火車並按特定行進行測試。這些行很可能不包含所有類。你最好隨機抽樣行測試/火車拆分。 – Gopala
不,那沒用。我嘗試隨機分割行,並得到完全相同的結果。 – JorgeF
只是爲了確保 - 您的混淆矩陣(預測的v實際)表示所有實際項目都屬於第1類,而不是它預測所有這些類都是第1類? – patrick