我對Python很熟悉,我正在學習Spark-Scala。在Spark-Scala中,如何將數組列表複製到DataFrame中?
我想建立具有由這種語法desribed結構的數據幀:
// Prepare training data from a list of (label, features) tuples.
val training = spark.createDataFrame(Seq(
(1.1, Vectors.dense(1.1, 0.1)),
(0.2, Vectors.dense(1.0, -1.0)),
(3.0, Vectors.dense(1.3, 1.0)),
(1.0, Vectors.dense(1.2, -0.5))
)).toDF("label", "features")
我從這個網址上面的語法: http://spark.apache.org/docs/latest/ml-pipeline.html
目前我的數據是數組,我已經退出出了DF的:
val my_a = gspc17_df.collect().map{row => Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}
我的陣列的結構非常類似於上述DF:
my_a: Array[Seq[Any]] =
Array(
List(-1.4830674013266898, [-0.004192832940431825,-0.003170667657263393]),
List(-0.05876766500768526, [-0.008462913654529357,-0.006880595828929472]),
List(1.0109273250546658, [-3.1816797620416693E-4,-0.006502619326182358]))
如何將數據從我的數組複製到具有上述結構的DataFrame?
我想這句法:
val my_df = spark.createDataFrame(my_a).toDF("label","features")
星火我吼道:
<console>:105: error: inferred type arguments [Seq[Any]] do not conform to method createDataFrame's type parameter bounds [A <: Product]
val my_df = spark.createDataFrame(my_a).toDF("label","features")
^
<console>:105: error: type mismatch;
found : scala.collection.mutable.WrappedArray[Seq[Any]]
required: Seq[A]
val my_df = spark.createDataFrame(my_a).toDF("label","features")
^
scala>