回答

0

有兩種方法可以實現這一點。一個是創建一個假設的predictPointLogisticRegression.scala

object ClassificationUtility { 
    def predictPoint(dataMatrix: Vector, model: LogisticRegressionModel): 
    (Double, Array[Double]) = { 
    require(dataMatrix.size == model.numFeatures) 
    val dataWithBiasSize: Int = model.weights.size/(model.numClasses - 1) 
    val weightsArray: Array[Double] = model.weights match { 
     case dv: DenseVector => dv.values 
     case _ => 
     throw new IllegalArgumentException(s"weights only supports dense vector but got type ${model.weights.getClass}.") 
    } 
    var bestClass = 0 
    var maxMargin = 0.0 
    val withBias = dataMatrix.size + 1 == dataWithBiasSize 
    val classProbabilities: Array[Double] = new Array[Double (model.numClasses) 
    (0 until model.numClasses - 1).foreach { i => 
     var margin = 0.0 
     dataMatrix.foreachActive { (index, value) => 
     if (value != 0.0) margin += value * weightsArray((i * dataWithBiasSize) + index) 
     } 
     // Intercept is required to be added into margin. 
     if (withBias) { 
     margin += weightsArray((i * dataWithBiasSize) + dataMatrix.size) 
     } 
     if (margin > maxMargin) { 
     maxMargin = margin 
     bestClass = i + 1 
     } 
     classProbabilities(i+1) = 1.0/(1.0 + Math.exp(-margin)) 
    } 
    return (bestClass.toDouble, classProbabilities) 
    } 
} 

注意它只是從原來的方法略有不同責任的方法,它只是計算物流作爲輸入要素的功能。它還定義了一些最初是私有的,幷包含在此方法之外的val和vars。最終,它將數組中的分數編入索引並將其與最佳答案一起返回。我打電話給我的方法,像這樣:

// Compute raw scores on the test set. 
val predictionAndLabelsAndProbabilities = test 
    .map { case LabeledPoint(label, features) => 
val (prediction, probabilities) = ClassificationUtility 
    .predictPoint(features, model) 
(prediction, label, probabilities)} 

但是:

看來星火貢獻者不鼓勵有利於ML的使用MLlib的。 ML邏輯迴歸API目前不支持多類分類。我現在使用的是OneVsRest,它可以作爲一個分類與所有分類的包裝。您可以通過模型迭代獲得原始分數:

val lr = new LogisticRegression().setFitIntercept(true) 
val ovr = new OneVsRest() 
ovr.setClassifier(lr) 
val ovrModel = ovr.fit(training) 
ovrModel.models.zipWithIndex.foreach { 
    case (model: LogisticRegressionModel, i: Int) => 
    model.save(s"model-${model.uid}-$i") 
} 

val model0 = LogisticRegressionModel.load("model-logreg_457c82141c06-0") 
val model1 = LogisticRegressionModel.load("model-logreg_457c82141c06-1") 
val model2 = LogisticRegressionModel.load("model-logreg_457c82141c06-2") 

現在,你有個別型號,您可以通過計算rawPrediction

def sigmoid(x: Double): Double = { 
    1.0/(1.0 + Math.exp(-x)) 
} 

val newPredictionAndLabels0 = model0.transform(newRescaledData) 
    .select("prediction", "rawPrediction") 
    .map(row => (row.getDouble(0), 
    sigmoid(row.getAs[org.apache.spark.mllib.linalg.DenseVector](1).values(1)))) 
newPredictionAndLabels0.foreach(println) 

val newPredictionAndLabels1 = model1.transform(newRescaledData) 
    .select("prediction", "rawPrediction") 
    .map(row => (row.getDouble(0), 
    sigmoid(row.getAs[org.apache.spark.mllib.linalg.DenseVector](1).values(1)))) 
newPredictionAndLabels1.foreach(println) 

val newPredictionAndLabels2 = model2.transform(newRescaledData) 
    .select("prediction", "rawPrediction") 
    .map(row => (row.getDouble(0), 
    sigmoid(row.getAs[org.apache.spark.mllib.linalg.DenseVector](1).values(1)))) 
newPredictionAndLabels2.foreach(println) 
乙狀結腸獲得的概率