2
我有一個返回數據的SQL查詢的數據幀如何轉換平坦的數據幀到火花嵌套JSON(斯卡拉或Java)
id,type,name,ppu,batter.id,batter.type,topping.id,topping.type
101,donut,cake,0_55,1001,Regular,5001,None
101,donut,cake,0_55,1002,Chocolate,5001,None
101,donut,cake,0_55,1003,Blueberry,5001,None
101,donut,cake,0_55,1004,Devil's Food,5001,None
101,donut,cake,0_55,1001,Regular,5002,Glazed
101,donut,cake,0_55,1002,Chocolate,5002,Glazed
101,donut,cake,0_55,1003,Blueberry,5002,Glazed
101,donut,cake,0_55,1004,Devil's Food,5002,Glazed
101,donut,cake,0_55,1001,Regular,5003,Chocolate
101,donut,cake,0_55,1002,Chocolate,5003,Chocolate
101,donut,cake,0_55,1003,Blueberry,5003,Chocolate
101,donut,cake,0_55,1004,Devil's Food,5003,Chocolate
這樣設置可我需要覆蓋到嵌套這json結構像這樣。
{
"id": "101",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
],
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5003", "type": "Chocolate" }
]
}
我們是否有可能在Dataframe聚合或自定義轉換中執行此操作,我必須編寫它。
在這裏找到了類似的問題 Writing nested JSON in spark scala 但是沒有相當正確的答案。
謝謝Chitral,但不幸的是,在我的情況下,對於100列和至少2層嵌套,這可能會變得很複雜。另外在我的情況下,數據可能不適合驅動程序,所以我期待在分佈式節點中進行轉換。我在某種程度上可以使用UDAF,但只能用於只有一個嵌套對象的簡單對象。我聽說Spark 2.0對此有更好的支持。但不確定。再次感謝您的努力:) – BRKumaran