如何在BinaryClassificationMetrics評估中爲Naive Bayes模型提供預測和標籤列

我對BinaryClassificationMetrics（Mllib）輸入有困惑。根據Apache Spark 1.6.0，我們需要從具有預測的概率（矢量）& rawPrediction（vector）的轉換的DataFrame中通過類型(RDD[(Double,Double)])的預測和標記。如何在BinaryClassificationMetrics評估中爲Naive Bayes模型提供預測和標籤列

我已經從預測列表和標籤列創建了RDD [（Double，Double）]。在對NavieBayesModel執行BinaryClassificationMetrics評估後，我可以檢索ROC，PR等。但是這些值是有限的，我無法使用由此生成的值繪製曲線。 Roc包含4個值，PR包含3個值。

是否準備PredictedandLabel正道還是我需要使用rawPrediction列或概率列，而不是預測列？

來源

2016-08-01 Desanth pv

你應該嘗試給'BinaryClassificationMetrics'原始概率，然後創建'BinaryClassificationMetrics'調整點數時設置的箱數。當使用由spark生成的模型（如LogisticRegressionModel）時，您需要清除閾值以獲取整個範圍的值。 –

@Hawknight。用** rawPrediction **代替** rawProbability **編輯了這個問題。我有一個場景，我需要使用NavieBayesModel，清除閾值功能在此模型中不可用。我希望你指定的是我在這個評論中提到的同一列，而不是**的概率** –

@Hawknight是否有任何方法可以明確地從NavieBayesModel中清除閾值。 –

準備這樣的：

import org.apache.spark.mllib.linalg.Vector 
import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel} 

val df = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") 
val predictions = new NaiveBayes().fit(df).transform(df) 

val preds = predictions.select("probability", "label").rdd.map(row => 
    (row.getAs[Vector](0)(0), row.getAs[Double](1)))

和評估：

import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics 

new BinaryClassificationMetrics(preds, 10).roc

如果預測是隻有0或1號桶能像你的情況下。嘗試更復雜的數據是這樣的：

val anotherPreds = df1.select(rand(), $"label").rdd.map(row => (row.getDouble(0), row.getDouble(1))) 
new BinaryClassificationMetrics(anotherPreds, 10).roc

來源

2016-08-01 20:56:45

如何在BinaryClassificationMetrics評估中爲Naive Bayes模型提供預測和標籤列

回答

相關問題