Dataframe.map需要使用多於數據集中的行的結果

我正在使用scala和spark並且有一個簡單的dataframe.map來產生所需的數據轉換。不過，我需要在修改後的原始文件中提供額外的一行數據。我怎樣才能使用dataframe.map來說明這一點。Dataframe.map需要使用多於數據集中的行的結果

例如：數據集從：

ID，姓名，年齡
1，約翰，23
2，彼得，32

如果年齡< 25默認爲25 。

數據集：

ID，姓名，年齡
1，約翰，25
1，約翰，-23
2，彼得，32

來源

2016-07-05 Pacchy

說一個 'UnionAll' 處理呢？

例如。

df1 = original dataframe 
    df2 = transformed df1 

    df1.unionAll(df2)

編輯：實現使用unionAll（）

val df1=sqlContext.createDataFrame(Seq( (1,"john",23) , (2,"peter",32))). 
      toDF("id","name","age") 

def udfTransform= udf[Int,Int] { (age) => if (age<25) 25 else age } 

val df2=df1.withColumn("age2", udfTransform($"age")). 
      where("age!=age2"). 
      drop("age2") 

df1.withColumn("age", udfTransform($"age")). 
    unionAll(df2). 
    orderBy("id"). 
    show() 

+---+-----+---+ 
| id| name|age| 
+---+-----+---+ 
| 1| john| 25| 
| 1| john| 23| 
| 2|peter| 32| 
+---+-----+---+

注：執行從最初提出的（幼稚）解決方案略有不同。魔鬼總是在細節中！

編輯2：實現使用嵌套陣列和爆炸

val df1=sx.createDataFrame(Seq( (1,"john",23) , (2,"peter",32))). 
      toDF("id","name","age") 
def udfArr= udf[Array[Int],Int] { (age) => 
       if (age<25) Array(age,25) else Array(age) } 

val df2=df1.withColumn("age", udfArr($"age")) 

df2.show() 
+---+-----+--------+ 
| id| name|  age| 
+---+-----+--------+ 
| 1| john|[23, 25]| 
| 2|peter| [32]| 
+---+-----+--------+ 


df2.withColumn("age",explode($"age")).show() 
+---+-----+---+ 
| id| name|age| 
+---+-----+---+ 
| 1| john| 23| 
| 1| john| 25| 
| 2|peter| 32| 
+---+-----+---+

來源

2016-07-05 05:15:52 WillemM

請問如何做你的答案來解決問題嗎？ – eliasah

參見上面的實現1和2。 – WillemM

似乎第二個選項可能適用於我，會嘗試並更新。謝謝。 – Pacchy

Dataframe.map需要使用多於數據集中的行的結果

回答

相關問題