naiveBayes在使用非零拉普拉斯變元時給出意想不到的結果（package e1071）

我試圖使用e1071包中的naiveBayes()函數。當我添加一個非零的參數時，我的概率估計不會改變，我不明白爲什麼。naiveBayes在使用非零拉普拉斯變元時給出意想不到的結果（package e1071）

例子：

library(e1071) 

# Generate data 
train.x <- data.frame(x1=c(1,1,0,0), x2=c(1,0,1,0)) 
train.y <- factor(c("cat", "cat", "dog", "dog")) 
test.x <- data.frame(x1=c(1), x2=c(1)) 

# without laplace smoothing 
classifier <- naiveBayes(x=train.x, y=train.y, laplace=0) 
predict(classifier, test.x, type="raw") # returns (1, 0.00002507) 

# with laplace smoothing 
classifier <- naiveBayes(x=train.x, y=train.y, laplace=1) 
predict(classifier, test.x, type="raw") # returns (1, 0.00002507)

我期望的概率在這種情況下改變，因爲所有的「狗」類的訓練實例爲X1有0。要對此進行檢查，這裏是用Python

Python的例子同樣的事情：

import numpy as np 
from sklearn.naive_bayes import BernoulliNB 

train_x = pd.DataFrame({'x1':[1,1,0,0], 'x2':[1,0,1,0]}) 
train_y = np.array(["cat", "cat", "dog", "dog"]) 
test_x = pd.DataFrame({'x1':[1,], 'x2':[1,]}) 

# alpha (i.e. laplace = 0) 
classifier = BernoulliNB(alpha=.00000001) 
classifier.fit(X=train_x, y=train_y) 
classifier.predict_proba(X=test_x) # returns (1, 0) 

# alpha (i.e. laplace = 1) 
classifier = BernoulliNB(alpha=1) 
classifier.fit(X=train_x, y=train_y) 
classifier.predict_proba(X=test_x) # returns (.75, .25)

爲什麼會出現使用e1071這個出人意料的結果？

來源

2016-04-26 Ben

拉普拉斯估計只適用於分類特徵，不適用於數字特徵。你可以在源代碼中找到：

## estimation-function 
est <- function(var) 
    if (is.numeric(var)) { 
     cbind(tapply(var, y, mean, na.rm = TRUE), 
       tapply(var, y, sd, na.rm = TRUE)) 
    } else { 
     tab <- table(y, var) 
     (tab + laplace)/(rowSums(tab) + laplace * nlevels(var)) 
    }

對於數值使用高斯估計。因此，將您的數據轉換爲因素，您就可以走了。

train.x <- data.frame(x1=c("1","1","0","0"), x2=c("1","0","1","0")) 
train.y <- factor(c("cat", "cat", "dog", "dog")) 
test.x <- data.frame(x1=c("1"), x2=c("1")) 

# without laplace smoothing 
classifier <- naiveBayes(x=train.x, y=train.y, laplace=0) 
predict(classifier, test.x, type="raw") # returns (100% for dog) 

# with laplace smoothing 
classifier <- naiveBayes(x=train.x, y=train.y, laplace=1) 
predict(classifier, test.x, type="raw") # returns (75% for dog)

來源

2016-04-26 21:32:19 lejlot

在這一個主要facepalm。 naiveBayes()方法將x1和x2解釋爲數值變量，因此試圖在內部使用高斯條件概率分佈（我認爲）。編碼我的變量作爲因素解決了我的問題。

train.x <- data.frame(x1=factor(c(1,1,0,0)), x2=factor(c(1,0,1,0))) 
train.y <- factor(c("cat", "cat", "dog", "dog")) 
test.x <- data.frame(x1=factor(c(1)), x2=factor(c(1)))

來源

2016-04-26 21:28:16 Ben

naiveBayes在使用非零拉普拉斯變元時給出意想不到的結果（package e1071）

回答

相關問題