2016-11-18 59 views

回答

2

是的,有一種方法。首先,第一列通過,使用split功能分割,然後分裂這個數據幀分成兩個dataframes(使用where兩次),只是加入這個新dataframes上第一列..

火花API斯卡拉這將會是如下:

val x1status = Seq(
    ("kv,true",45), 
    ("bm,true",65), 
    ("mp,true",75), 
    ("kv,null",450), 
    ("bm,null",550), 
    ("mp,null",650)).toDF("x1", "x2") 

val x1 = x1status 
    .withColumn("split", split('x1, ",")) 
    .withColumn("x1", 'split getItem 0) 
    .withColumn("status", 'split getItem 1) 
    .drop("split") 

scala> x1.show 
+---+---+------+ 
| x1| x2|status| 
+---+---+------+ 
| kv| 45| true| 
| bm| 65| true| 
| mp| 75| true| 
| kv|450| null| 
| bm|550| null| 
| mp|650| null| 
+---+---+------+ 

val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true") 
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null") 

val result = trueDF.join(nullDF, "x1").drop("status") 

scala> result.show 
+---+----+----+ 
| x1|true|null| 
+---+----+----+ 
| kv| 45| 450| 
| bm| 65| 550| 
| mp| 75| 650| 
+---+----+----+ 
相關問題