我需要將數據幀平坦化以便將其與Spark(Scala)中的另一個數據幀進行連接。Spark:數據幀中的嵌套數據結構Flatten
基本上我的2個dataframes已經得到了以下的模式:
DF1
root
|-- field1: string (nullable = true)
|-- field2: long (nullable = true)
|-- field3: long (nullable = true)
|-- field4: long (nullable = true)
|-- field5: integer (nullable = true)
|-- field6: timestamp (nullable = true)
|-- field7: long (nullable = true)
|-- field8: long (nullable = true)
|-- field9: long (nullable = true)
|-- field10: integer (nullable = true)
DF2
root
|-- field1: long (nullable = true)
|-- field2: long (nullable = true)
|-- field3: string (nullable = true)
|-- field4: integer (nullable = true)
|-- field5: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- field6: long (nullable = true)
| | |-- field7: integer (nullable = true)
| | |-- field8: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- field9: string (nullable = true)
| | | | |-- field10: integer (nullable = true)
|-- field11: timestamp (nullable = true)
老實說,我不知道我怎麼可以拼合DF2。最後,我需要加入2個dataframes上DF.field4 = DF2.field9
我使用2.1.0
我首先想到的是使用爆炸,但已在星火棄用2.1.0有誰對我有什麼線索?
我想用數據集,你可以使用flatMap? – LiMuBei
我想你必須首先爆炸你的數組,然後加入 –
Spark 2.1.0中的爆炸功能已棄用 – Oliviervs