2017-03-02 89 views
0

我需要將數據幀平坦化以便將其與Spark(Scala)中的另一個數據幀進行連接。Spark:數據幀中的嵌套數據結構Flatten

基本上我的2個dataframes已經得到了以下的模式:

DF1

root 
|-- field1: string (nullable = true) 
|-- field2: long (nullable = true) 
|-- field3: long (nullable = true) 
|-- field4: long (nullable = true) 
|-- field5: integer (nullable = true) 
|-- field6: timestamp (nullable = true) 
|-- field7: long (nullable = true) 
|-- field8: long (nullable = true) 
|-- field9: long (nullable = true) 
|-- field10: integer (nullable = true) 

DF2

root 
|-- field1: long (nullable = true) 
|-- field2: long (nullable = true) 
|-- field3: string (nullable = true) 
|-- field4: integer (nullable = true) 
|-- field5: array (nullable = true) 
| |-- element: struct (containsNull = true) 
| | |-- field6: long (nullable = true) 
| | |-- field7: integer (nullable = true) 
| | |-- field8: array (nullable = true) 
| | | |-- element: struct (containsNull = true) 
| | | | |-- field9: string (nullable = true) 
| | | | |-- field10: integer (nullable = true) 
|-- field11: timestamp (nullable = true) 

老實說,我不知道我怎麼可以拼合DF2。最後,我需要加入2個dataframes上DF.field4 = DF2.field9

我使用2.1.0

我首先想到的是使用爆炸,但已在星火棄用2.1.0有誰對我有什麼線索?

+0

我想用數據集,你可以使用flatMap? – LiMuBei

+1

我想你必須首先爆炸你的數組,然後加入 –

+0

Spark 2.1.0中的爆炸功能已棄用 – Oliviervs

回答

1

我錯了爆炸功能仍處於星火2.1.0可用下functions.explode在org.apache.spark.sql包

感謝

您可以找到下面的代碼:

val DF2Exploded1 = DF2.select(DF2("*"), functions.explode(DF2("field5")) 
         .alias("field5_exploded")) 

val DF2Exploded2 = DF2Exploded1.select(DF2Exploded1("*"), functions.explode(DF2Exploded1("field5_exploded.field8")) 
           .alias("field8_exploded")) 
+1

您可以在答案中添加代碼嗎? – mrsrinivas

+0

您好Oliviervs,有沒有什麼辦法可以避免每個陣列爆炸一個,並在一個嵌套的操作扁平數據幀? –