今天早上我問了一個問題,但是我刪除了這個問題,並在這裏發佈了更多的betterer措辭。如何將Naive Bayes模型應用到新數據中
我使用火車和測試數據創建了我的第一個機器學習模型。我返回了一個混淆矩陣,並看到一些彙總統計信息。
我現在想將模型應用於新數據來做出預測,但我不知道如何。
上下文:預測每月「流失」取消。目標變量是「攪動」的,它有兩個可能的標籤「攪動」和「不攪動」。
head(tdata)
months_subscription nvk_medium org_type churned
1 25 none Community not churned
2 7 none Sports clubs not churned
3 28 none Sports clubs not churned
4 18 unknown Religious congregations and communities not churned
5 15 none Association - Professional not churned
6 9 none Association - Professional not churned
這裏是我的培訓和測試:
library("klaR")
library("caret")
# import data
test_data_imp <- read.csv("tdata.csv")
# subset only required vars
# had to remove "revenue" since all churned records are 0 (need last price point)
variables <- c("months_subscription", "nvk_medium", "org_type", "churned")
tdata <- test_data_imp[variables]
#training
rn_train <- sample(nrow(tdata),
floor(nrow(tdata)*0.75))
train <- tdata[rn_train,]
test <- tdata[-rn_train,]
model <- NaiveBayes(churned ~., data=train)
# testing
predictions <- predict(model, test)
confusionMatrix(test$churned, predictions$class)
了一切到這裏工作得很好。
現在我有了新的數據,結構和佈局方式與上面的tdata相同。我怎樣才能將我的模型應用於這些新數據來做出預測?直覺上,我正在尋找一個新的專欄,每個記錄都有預測的類別。
我嘗試這樣做:
## prediction ##
# import data
data_imp <- read.csv("pdata.csv")
pdata <- data_imp[variables]
actual_predictions <- predict(model, pdata)
#append to data and output (as head by default)
predicted_data <- cbind(pdata, actual_predictions$class)
# output
head(predicted_data)
哪個扔錯誤
actual_predictions <- predict(model, pdata)
Error in object$tables[[v]][, nd] : subscript out of bounds
In addition: Warning messages:
1: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 1
2: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 2
3: In FUN(1:6433[[4L]], ...) :
Numerical 0 probability for all classes with observation 3
我如何能將我的模型到新的數據?我想要一個新的數據框與一個新的列有預測的類?
**下面的註釋,這裏是頭部和預測的新數據的STR **
head(pdata)
months_subscription nvk_medium org_type churned
1 26 none Community not churned
2 8 none Sports clubs not churned
3 30 none Sports clubs not churned
4 19 unknown Religious congregations and communities not churned
5 16 none Association - Professional not churned
6 10 none Association - Professional not churned
> str(pdata)
'data.frame': 6433 obs. of 4 variables:
$ months_subscription: int 26 8 30 19 16 10 3 5 14 2 ...
$ nvk_medium : Factor w/ 16 levels "cloned","CommunityIcon",..: 9 9 9 16 9 9 9 3 12 9 ...
$ org_type : Factor w/ 21 levels "Advocacy and civic activism",..: 8 18 18 14 6 6 11 19 6 8 ...
$ churned : Factor w/ 1 level "not churned": 1 1 1 1 1 1 1 1 1 1 ...
如何在變量'pdata'的數據是什麼樣子?你可以加上'head(pdata)'的結果嗎? – tguzella
嗨@tguzella與tdata完全相同,除了攪動的所有實例都表示「不攪動」(因爲我想預測哪個會攪動「 –
好吧,考慮到錯誤,我傾向於認爲數據不一樣'tdata' ...這個錯誤似乎是在處理一個功能時觸發的,但是,如果你不顯示數據,那麼根本不可能知道出了什麼問題 – tguzella