f1score的Spark mllib閾值

我正在嘗試找到最佳閾值，讓我的邏輯迴歸具有最高的f1分數。然而，當我寫了下面幾行：f1score的Spark mllib閾值

val f1Score = metrics.fMeasureByThreshold 
f1Score.foreach { case (t, f) => 
println(s"Threshold: $t, F-score: $f, Beta = 1")

一些奇怪的值出現，例如：

Threshold: 2.0939996826644833, F-score: 0.285648784961027, Beta = 1 
Threshold: 2.093727854652065, F-score: 0.28604171441668574, Beta = 1 
Threshold: 2.0904571465313113, F-score: 0.2864344637946838, Beta = 1 
Threshold: 2.0884466833553468, F-score: 0.28682703321878583, Beta = 1 
Threshold: 2.0882666552407283, F-score: 0.2872194228126431, Beta = 1 
Threshold: 2.0835997800203447, F-score: 0.2876116326997939, Beta = 1 
Threshold: 2.077892816382506, F-score: 0.28800366300366304, Beta = 1

怎麼可能有大於一的門檻？對於在控制檯輸出中進一步顯示的負值也是如此。

來源

2017-08-01 Tiffany

我犯了一個錯誤早些時候我的數據幀轉換時的RDD，而不是寫：

val predictionAndLabels =predictions.select("probability", "labelIndex").rdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))

我寫道：

val predictionAndLabels =predictions.select("rawPredictions", "labelIndex").rdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))

所以閾值分別對rawPredictions而不是概率，一切都很有意義

來源

2017-08-01 12:59:38 Tiffany

f1score的Spark mllib閾值

回答

相關問題