嘗試在Apache Spark中爲分類模型實現predictRaw（）

開發者API示例（https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala）給出了分類模型中函數predictRaw（）的簡單實現示例。這是抽象類ClassificationModel中必須在具體類中實現的功能。根據顯影劑API例如，可以按以下方法計算它：嘗試在Apache Spark中爲分類模型實現predictRaw（）

override def predictRaw(features: Features.Type): Vector = { 
    val margin = BLAS.dot(features, coefficients) 
    Vectors.dense(-margin, margin) // Binary classification so we return a length-2 vector, where index i corresponds to class i (i = 0, 1). 
}

我的BLAS.dot(features, coefficients)理解的是，這僅僅是特徵向量（長度numFeatures的）的矩陣點積由係數向量（長度的numFeatures），因此有效地將每個「特徵」列以一個係數加以摺疊，然後求和得到val margin。然而，Spark不再提供對BLAS庫的訪問權限，因爲它在MLlib中是私有的，而Matrix Matrix中提供了多種工廠方法進行乘法的矩陣mutliplication。

我如何使用矩陣工廠方法來實現predictRaw()理解如下：

override def predictRaw(features: Vector): Vector = { 

//coefficients is a Vector of length numFeatures: val coefficients = Vectors.zeros(numFeatures) 
val coefficientsArray = coefficients.toArray 
val coefficientsMatrix: SparkDenseMatrix = new SparkDenseMatrix(numFeatures, 1, coefficientsArray) 
val margin: Array[Double] = coefficientsMatrix.multiply(features).toArray // contains a single element 
val rawPredictions: Array[Double] = Array(-margin(0),margin(0)) 
new SparkDenseVector(rawPredictions) 
}

這將需要轉換的數據結構數組的開銷。有沒有更好的辦法？ BLAS現在是私人的，這似乎很奇怪。 NB。代碼未經測試！目前val coefficients: Vector只是一個零向量，但是一旦我實現了學習算法，這將包含結果。

來源

2017-08-30 LucieCBurgess

我想我已經解決了這個問題。 Spark DeveloperAPI示例非常令人困惑，因爲predictRaw（）計算邏輯迴歸類型示例的置信區間。然而，當實現ClassificationModel時，predictRaw（）實際上應該做的是預測輸入數據集的每個第i個樣本的輸出標籤矢量。從技術角度講，上面的矩陣乘法在沒有使用BLAS的情況下是正確的 - 但實際上預測Raw（）不必這樣計算。

從底層源代碼： https://github.com/apache/spark/blob/v2.2.0/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala

* @return vector where element i is the raw prediction for label i. * This raw prediction may be any real number, where a larger value indicates greater * confidence for that label.

該函數然後raw2predict計算的實際標籤從原始預測但不需要被實現爲這是由API來完成。

來源

2017-09-02 14:48:29 LucieCBurgess

嘗試在Apache Spark中爲分類模型實現predictRaw（）

回答

相關問題