2016-01-09 97 views
4

我有一個org.apache.spark.mllib.linalg.Vector RDD [Int Int Int]。 我想使用此代碼使用Scala將org.apache.spark.mllib.linalg.Vector RDD轉換爲Spark中的DataFrame

import sqlContext.implicits._ 
import org.apache.spark.sql.types.StructType 
import org.apache.spark.sql.types.StructField 
import org.apache.spark.sql.types.DataTypes 
import org.apache.spark.sql.types.ArrayData 

vectrdd屬於類型org.apache.spark.mllib.linalg.Vector

val vectarr = vectrdd.toArray() 
case class RFM(Recency: Integer, Frequency: Integer, Monetary: Integer) 
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF() 

我收到以下錯誤

將其轉換成數據幀
warning: fruitless type test: a value of type   
org.apache.spark.mllib.linalg.Vector cannot also be a Array[T] 
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF() 

error: pattern type is incompatible with expected type; 
found : Array[T] 
required: org.apache.spark.mllib.linalg.Vector 
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF() 

我試圖第二種方法是本

val vectarr=vectrdd.toArray().take(2) 
case class RFM(Recency: String, Frequency: String, Monetary: String) 
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF() 

我得到這個錯誤

error: constructor cannot be instantiated to expected type; 
found : (T1, T2, T3) 
required: org.apache.spark.mllib.linalg.Vector 
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF() 

我用這個例子作爲指南>> Convert RDD to Dataframe in Spark/Scala

回答

3

vectarr將有Array[org.apache.spark.mllib.linalg.Vector]類型,所以在模式匹配你所無法比擬的Array(p0, p1, p2)因爲被匹配什麼是一個矢量,而不是數組。

而且,你不應該這樣做val vectarr = vectrdd.toArray() - 這將RDD轉化爲數組,然後到toDF最後調用將無法工作,因爲toDF僅適用於RDD的。

將(提供你改變RFM有雙打)正確的線

val df = vectrdd.map(_.toArray).map { case Array(p0, p1, p2) => RFM(p0, p1, p2)}.toDF() 

或,等價地,具有val arrayRDD = vectrdd.map(_.toArray())(產生RDD[Array[Double]]

取代 val vectarr = vectrdd.toArray()(其產生 Array[Vector]
相關問題