explode函數應該完成。
pyspark版本:
>>> df = spark.createDataFrame([(1, "A", [1,2,3]), (2, "B", [3,5])],["col1", "col2", "col3"])
>>> from pyspark.sql.functions import explode
>>> df.withColumn("col3", explode(df.col3)).show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| A| 1|
| 1| A| 2|
| 1| A| 3|
| 2| B| 3|
| 2| B| 5|
+----+----+----+
斯卡拉版本
scala> val df = Seq((1, "A", Seq(1,2,3)), (2, "B", Seq(3,5))).toDF("col1", "col2", "col3")
df: org.apache.spark.sql.DataFrame = [col1: int, col2: string ... 1 more field]
scala> df.withColumn("col3", explode($"col3")).show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| A| 1|
| 1| A| 2|
| 1| A| 3|
| 2| B| 3|
| 2| B| 5|
+----+----+----+
你看了'爆炸()'函數? – mtoto
我不明白,如果它在一列上工作,其他列將會發生什麼情況。 – Gluz
也許你應該試試 – mtoto