3
哪種方法最適合計算無關數據集的馬修斯相關係數(MCC)值?哪種方法最適合計算無關數據集的馬修斯相關係數(MCC)值?
哪種方法最適合計算無關數據集的馬修斯相關係數(MCC)值?哪種方法最適合計算無關數據集的馬修斯相關係數(MCC)值?
我不確定這裏的「最佳方法」是什麼意思,但給定confusion matrix,計算應該很簡單。在Python:
import math
# tp is true positives, fn is false negatives, etc
mcc = (tp*tn - fp*fn)/math.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn))
以前的答案是正確的,但是在公式中你可能想也考慮這些情況下,任何分母四個總和爲零;在這種情況下,分母可以任意設定爲1。
爲了完整起見,我下面將R代碼裏面(原代碼可以發現here)
mcc <- function (actual, predicted)
{
# handles zero denominator and verflow error on large-ish products in denominator.
#
# actual = vector of true outcomes, 1 = Positive, 0 = Negative
# predicted = vector of predicted outcomes, 1 = Positive, 0 = Negative
# function returns MCC
TP <- sum(actual == 1 & predicted == 1)
TN <- sum(actual == 0 & predicted == 0)
FP <- sum(actual == 0 & predicted == 1)
FN <- sum(actual == 1 & predicted == 0)
sum1 <- TP+FP; sum2 <-TP+FN ; sum3 <-TN+FP ; sum4 <- TN+FN;
denom <- as.double(sum1)*sum2*sum3*sum4 # as.double to avoid overflow error on large products
if (any(sum1==0, sum2==0, sum3==0, sum4==0)) {
denom <- 1
}
mcc <- ((TP*TN)-(FP*FN))/sqrt(denom)
return(mcc)
}