是的,有一種方法。首先,第一列通過,
使用split功能分割,然後分裂這個數據幀分成兩個dataframes(使用where
兩次),只是加入這個新dataframes上第一列..
火花API斯卡拉這將會是如下:
val x1status = Seq(
("kv,true",45),
("bm,true",65),
("mp,true",75),
("kv,null",450),
("bm,null",550),
("mp,null",650)).toDF("x1", "x2")
val x1 = x1status
.withColumn("split", split('x1, ","))
.withColumn("x1", 'split getItem 0)
.withColumn("status", 'split getItem 1)
.drop("split")
scala> x1.show
+---+---+------+
| x1| x2|status|
+---+---+------+
| kv| 45| true|
| bm| 65| true|
| mp| 75| true|
| kv|450| null|
| bm|550| null|
| mp|650| null|
+---+---+------+
val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")
val result = trueDF.join(nullDF, "x1").drop("status")
scala> result.show
+---+----+----+
| x1|true|null|
+---+----+----+
| kv| 45| 450|
| bm| 65| 550|
| mp| 75| 650|
+---+----+----+