相對較新的Scala和星火API工具包,但我試圖利用向量彙編輸入星火斯卡拉數據框柱爲Vector
http://spark.apache.org/docs/latest/ml-features.html#vectorassembler
到再利用矩陣的相關性
的問題https://spark.apache.org/docs/2.1.0/mllib-statistics.html#correlations
數據幀列是D型細胞linalg.Vector的
val assembler = new VectorAssembler()
val trainwlabels3 = assembler.transform(trainwlabels2)
trainwlabels3.dtypes(0)
res90: (String, String) = (features,[email protected])
但仍然將此調用到統計工具的RDD會引發不匹配錯誤。
val data: RDD[Vector] = sc.parallelize(
trainwlabels3("features")
)
<console>:80: error: type mismatch;
found : org.apache.spark.sql.Column
required: Seq[org.apache.spark.mllib.linalg.Vector]
在此先感謝您的幫助。