0
我有一個Spark數據框,看起來像這樣:數據幀的轉換與嵌套結構
root
|-- employeeName: string (nullable = true)
|-- employeeId: string (nullable = true)
|-- employeeEmail: string (nullable = true)
|-- company: struct (nullable = true)
| |-- companyName: string (nullable = true)
| |-- companyId: string (nullable = true)
| |-- details: struct (nullable = true)
| | |-- founded: string (nullable = true)
| | |-- address: string (nullable = true)
| | |-- industry: string (nullable = true)
我想要做的就是按companyId,每位公司員工組成的數組,像這樣:
root
|-- company: struct (nullable = true)
| |-- companyName: string (nullable = true)
| |-- companyId: string (nullable = true)
| |-- details: struct (nullable = true)
| | |-- founded: string (nullable = true)
| | |-- address: string (nullable = true)
| | |-- industry: string (nullable = true)
|-- employees: array (nullable = true)
| |-- employee: struct (nullable = true)
| | |-- employeeName: string (nullable = true)
| | |-- employeeId: string (nullable = true)
| | |-- employeeEmail: string (nullable = true)
當然,如果我只有一對(公司,員工):(字符串,字符串)使用map和reduceByKey,我可以很容易地做到這一點。但是,對於所有不同的嵌套信息,我不確定要採取什麼方法。
我應該嘗試平整一切嗎?任何做類似事情的例子都會很有幫助。
謝謝,我設法以類似的方式解決它。 – Dmitri