0

我通過spark.ml.classification.LogisticRegressionModel.predict得到預測。多個行具有prediction列作爲1.0probability列作爲.04model.getThreshold0.5,所以我會假設該模型將0.5概率閾值的所有內容歸類爲1.0如何解釋火花邏輯迴歸預測中的概率列?

我該如何解釋1.0 predictionprobability爲0.04的結果?

回答

1

執行LogisticRegression的概率列應包含一個與類數相同的列表,其中每個索引給出該類的相應概率。我做了一個有兩個例子的小例子:

case class Person(label: Double, age: Double, height: Double, weight: Double) 
val df = List(Person(0.0, 15, 175, 67), 
     Person(0.0, 30, 190, 100), 
     Person(1.0, 40, 155, 57), 
     Person(1.0, 50, 160, 56), 
     Person(0.0, 15, 170, 56), 
     Person(1.0, 80, 180, 88)).toDF() 

val assembler = new VectorAssembler().setInputCols(Array("age", "height", "weight")) 
    .setOutputCol("features") 
    .select("label", "features") 
val df2 = assembler.transform(df) 
df2.show 

+-----+------------------+ 
|label|   features| 
+-----+------------------+ 
| 0.0| [15.0,175.0,67.0]| 
| 0.0|[30.0,190.0,100.0]| 
| 1.0| [40.0,155.0,57.0]| 
| 1.0| [50.0,160.0,56.0]| 
| 0.0| [15.0,170.0,56.0]| 
| 1.0| [80.0,180.0,88.0]| 
+-----+------------------+ 

val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8) 
val Array(testing, training) = df2.randomSplit(Array(0.7, 0.3)) 

val model = lr.fit(training) 
val predictions = model.transform(testing) 
predictions.select("probability", "prediction").show(false) 


+----------------------------------------+----------+ 
|probability        |prediction| 
+----------------------------------------+----------+ 
|[0.7487950501224138,0.2512049498775863] |0.0  | 
|[0.6458452667523259,0.35415473324767416]|0.0  | 
|[0.3888393314864866,0.6111606685135134] |1.0  | 
+----------------------------------------+----------+ 

這裏是概率以及算法做出的最終預測。最終可能性最高的類是預測的類。