火花mllib應用功能

我有一個rowMatrix xw火花mllib應用功能

scala> xw 
res109: org.apache.spark.mllib.linalg.distributed.RowMatrix = [email protected]

，我想給一個函數應用到它的每個元素的一個rowMatrix的所有元素：

f(x)=exp(-x*x)

的矩陣元素的類型可以被可視化爲：

scala> xw.rows.first 

res110: org.apache.spark.mllib.linalg.Vector = [0.008930720313311474,0.017169380001300985,-0.013414238595719104,0.02239106636801034,0.023009502628798143,0.02891937604244297,0.03378470969100948,0.03644030110678057,0.0031586143217048825,0.011230244437457062,0.00477455053405408,0.020251682490519785,-0.005429788421130285,0.011578489275815267,0.0019301805575977788,0.022513736483645713,0.009475039307158668,0.019457912132044935,0.019209006632742498,-0.029811133879879596]

我的主要問題是我不能在地圖上使用地圖

scala> xw.rows.map(row => row.map(e => breeze.numerics.exp(e))) 
<console>:44: error: value map is not a member of org.apache.spark.mllib.linalg.Vector 
       xw.rows.map(row => row.map(e => breeze.numerics.exp(e))) 
            ^

scala>

我該如何解決？

來源

2015-02-10 Donbeo

這是假設你知道你實際上有一個DenseVector（這似乎是這種情況）。您可以在載體中，其中有一個叫圖toArray，然後再轉換回DenseVector與Vectors.dense：

xw.rows.map{row => Vectors.dense(row.toArray.map{e => breeze.numerics.exp(e)})}

你可以這樣做一個SparseVector爲好;它在數學上是正確的，但是轉換爲數組可能效率極低。另一個選擇是撥打row.copy，然後使用foreachActive，這對密集和稀疏矢量都有意義。但copy可能不會針對您正在使用的特定Vector類實現，並且如果您不知道向量的類型，則不能對數據進行變異。如果你真的需要支持稀疏密集的向量，我會做這樣的事情：

xw.rows.map{ 
    case denseVec: DenseVector => 
    Vectors.dense(denseVec.toArray.map{e => breeze.numerics.exp(e)})} 
    case sparseVec: SparseVector => 
    //we only need to update values of the sparse vector -- the indices remain the same 
    val newValues: Array[Double] = sparseVec.values.map{e => breeze.numerics.exp(e)} 
    Vectors.sparse(sparseVec.size, sparseVec.indices, newValues) 
}

來源

2015-02-12 15:26:17

感謝您的答案。所以對於vectors.dense類，你建議我使用提供的代碼行嗎？你是否可以在答案的第二部分編寫代碼？我是斯卡拉初學者，所以它不是太容易遵循 – Donbeo 2015-02-12 15:36:16

@唐貝我更新了答案一點。如果你確定你有DenseVectors，那就去找第一個答案。如果你可能稀疏或密集，你可以使用第二個，但請注意，即使這樣也不能處理Vector的其他可能的實現。（例如，它不處理'VectorUDT'） – 2015-02-12 17:58:13

火花mllib應用功能

回答

相關問題