試試這個(假設你有模型對象mdl
你,還假設你的反應變量popularity
有2級1
(正)和0
),通過應用precision
定義(您可以嘗試使用一些基於kNN
的non-parametric
方法來聚合當前臨界截止點處的精度值,或者使用擬合曲線作爲Precision=f(Cutoff)
來查找未知截止點處的精度,但這將再次近似,而不是通過定義精度來爲您提供co rrect結果):
p <- predict(mdl, newdata=ds_ts, type='response') # compute the prob that the output class label is 1
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision
preds <- ifelse(p > test_cut_off, 1, 0) # find the class labels predicted with the new cut off
prec <- sum((preds == 1) & (ds_ts$popularity == 1))/sum(preds == 1) # TP/(TP + FP)
[EDITED} 試試下面這個簡單的實驗,隨機生成的數據(你可以用自己的數據測試)。
set.seed(1234)
ds_ts <- data.frame(x=rnorm(100), popularity=sample(0:1, 100, replace=TRUE))
mdl <- glm(popularity~x, ds_ts, family=binomial())
y_hat = predict(mdl, newdata=ds_ts, type="response")
pred = prediction(y_hat, ds_ts$popularity)
perfPrc = performance(pred, "prec")
xPrc = [email protected][[1]]
yPrc = [email protected][[1]]
plot(xPrc, yPrc, pch=19)
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision
# Find the precision value corresponds to a cutoff threshold, since it's not there you can't get this way
prc = yPrc[c(test_cut_off)] # perfPrc isn't continuous
prC#
# numeric(0)
# workaround: 1-NN, use the precision at the neasrest cutoff to get an approximate precision, the one you have used should work
nearest_cutoff_index <- which.min(abs(xPrc - test_cut_off))
approx_prec_at_cutoff <- yPrc[nearest_cutoff_index]
approx_prec_at_cutoff
# [1] 0.5294118
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)
紅色點表示的近似精度(其可以是正好等於實際精度,如果我們幸運的話)。
# use average precision from k-NN
k <- 3 # 3-NN
nearest_cutoff_indices <- sort(abs(xPrc - test_cut_off), index.return=TRUE)$ix[1:k]
approx_prec_at_cutoff <- mean(yPrc[nearest_cutoff_indices])
approx_prec_at_cutoff
# [1] 0.5294881
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)
p <- predict(mdl, newdata=ds_ts, type='response')
preds <- ifelse(p > 0.5000188, 1, 0)
actual_prec_at_cutoff <- sum((preds == 1) & (ds_ts$popularity == 1))/sum(preds == 1) # TP/(TP + FP)
actual_prec_at_cutoff
# [1] 0.5294118
什麼是你的yPrc? –
在你的'pred'定義中,你給出了一個單獨的向量作爲'newdata'參數。這不好。你應該像在'y_hat'定義中那樣給它一個數據框。如果這不起作用,您應該共享有關您如何創建模型的信息。代碼或「調用」應該足夠了。 – Gregor
head(yPrc): [1] NaN 1.0000000 0.5000000 0.6666667 0.5000000 0.4000000 – InterruptedException