2016-12-20 56 views
1

如何將spark中的列組合爲嵌套數組?Spark將列組合爲嵌套數組

val inputSmall = Seq(
    ("A", 0.3, "B", 0.25), 
    ("A", 0.3, "g", 0.4), 
    ("d", 0.0, "f", 0.1), 
    ("d", 0.0, "d", 0.7), 
    ("A", 0.3, "d", 0.7), 
    ("d", 0.0, "g", 0.4), 
    ("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2") 

以類似

+-------+---------------+---------------+------- + 
|column1|transformedCol1|transformedCol2|combined| 
+-------+---------------+---------------+------ -+ 
|  A|   0.3|   0.3[0.3, 0.3]| 
+-------+---------------+---------------+-------+ 

回答

8

如果你想多列合併成數組類型的新列,您可以使用array功能的東西:

import org.apache.spark.sql.functions._ 
val result = inputSmall.withColumn("combined", array($"transformedCol1", $"transformedCol2")) 
result.show() 

+-------+---------------+-------+---------------+-----------+ 
|column1|transformedCol1|column2|transformedCol2| combined| 
+-------+---------------+-------+---------------+-----------+ 
|  A|   0.3|  B|   0.25|[0.3, 0.25]| 
|  A|   0.3|  g|   0.4| [0.3, 0.4]| 
|  d|   0.0|  f|   0.1| [0.0, 0.1]| 
|  d|   0.0|  d|   0.7| [0.0, 0.7]| 
|  A|   0.3|  d|   0.7| [0.3, 0.7]| 
|  d|   0.0|  g|   0.4| [0.0, 0.4]| 
|  c|   0.2|  B|   0.25|[0.2, 0.25]| 
+-------+---------------+-------+---------------+-----------+