查找對應於特定X的Y值

我試圖找到與0.5的截止閾值相對應的精度值，作爲我的模型評估（邏輯迴歸）的一部分。我得到了numeric(0)而不是Y值。查找對應於特定X的Y值

y_hat = predict(mdl, newdata=ds_ts, type="response") 

pred = prediction(y_hat, ds_ts$popularity) 

perfPrc = performance(pred, "prec")   

xPrc = [email protected][[1]] 

# Find the precision value corresponds to a cutoff threshold of 0.5 
prc = yPrc[c(0.5000188)] # perfPrc isn't continuous - closest value to 0.5 

prC# output is 'numeric(0)' `

來源

2017-01-24 InterruptedException

什麼是你的yPrc？ –

在你的'pred'定義中，你給出了一個單獨的向量作爲'newdata'參數。這不好。你應該像在'y_hat'定義中那樣給它一個數據框。如果這不起作用，您應該共享有關您如何創建模型的信息。代碼或「調用」應該足夠了。 – Gregor

head（yPrc）： [1] NaN 1.0000000 0.5000000 0.6666667 0.5000000 0.4000000 – InterruptedException

試試這個（假設你有模型對象mdl你，還假設你的反應變量popularity有2級1（正）和0），通過應用precision定義（您可以嘗試使用一些基於kNN的non-parametric方法來聚合當前臨界截止點處的精度值，或者使用擬合曲線作爲Precision=f(Cutoff)來查找未知截止點處的精度，但這將再次近似，而不是通過定義精度來爲您提供co rrect結果）：

p <- predict(mdl, newdata=ds_ts, type='response') # compute the prob that the output class label is 1 
test_cut_off <- 0.5 # this is the cut off value for which you want to find precision 
preds <- ifelse(p > test_cut_off, 1, 0) # find the class labels predicted with the new cut off 
prec <- sum((preds == 1) & (ds_ts$popularity == 1))/sum(preds == 1) # TP/(TP + FP)

[EDITED} 試試下面這個簡單的實驗，隨機生成的數據（你可以用自己的數據測試）。

set.seed(1234) 
ds_ts <- data.frame(x=rnorm(100), popularity=sample(0:1, 100, replace=TRUE)) 
mdl <- glm(popularity~x, ds_ts, family=binomial()) 
y_hat = predict(mdl, newdata=ds_ts, type="response") 
pred = prediction(y_hat, ds_ts$popularity) 
perfPrc = performance(pred, "prec")   
xPrc = [email protected][[1]] 
yPrc = [email protected][[1]] 
plot(xPrc, yPrc, pch=19)

test_cut_off <- 0.5 # this is the cut off value for which you want to find precision 

# Find the precision value corresponds to a cutoff threshold, since it's not there you can't get this way 
prc = yPrc[c(test_cut_off)] # perfPrc isn't continuous 
prC# 
# numeric(0) 

# workaround: 1-NN, use the precision at the neasrest cutoff to get an approximate precision, the one you have used should work 
nearest_cutoff_index <- which.min(abs(xPrc - test_cut_off)) 
approx_prec_at_cutoff <- yPrc[nearest_cutoff_index] 
approx_prec_at_cutoff 
# [1] 0.5294118 
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)

紅色點表示的近似精度（其可以是正好等於實際精度，如果我們幸運的話）。

# use average precision from k-NN 
k <- 3 # 3-NN 
nearest_cutoff_indices <- sort(abs(xPrc - test_cut_off), index.return=TRUE)$ix[1:k] 
approx_prec_at_cutoff <- mean(yPrc[nearest_cutoff_indices]) 
approx_prec_at_cutoff 
# [1] 0.5294881 
points(test_cut_off, approx_prec_at_cutoff, pch=19, col='red', cex=2)

p <- predict(mdl, newdata=ds_ts, type='response') 
preds <- ifelse(p > 0.5000188, 1, 0) 
actual_prec_at_cutoff <- sum((preds == 1) & (ds_ts$popularity == 1))/sum(preds == 1) # TP/(TP + FP) 
actual_prec_at_cutoff 
# [1] 0.5294118

來源

2017-01-24 20:51:32

謝謝。我寧願不直接計算它，我仍然不確定我發佈的內容出了什麼問題。 – InterruptedException

沒有什麼錯，只是如果你想計算在x值中沒有的指定截斷值的精度，你需要編寫你自己的代碼來近似它，例如你可以得到最接近的截斷值精度或可能是k最近鄰居的平均值。 –

如果你可以分享你的數據（或樣本），我們可以檢查出來。 –

查找對應於特定X的Y值

回答

相關問題