2017-04-22 37 views
2

我有一個返回數據的SQL查詢的數據幀如何轉換平坦的數據幀到火花嵌套JSON(斯卡拉或Java)

id,type,name,ppu,batter.id,batter.type,topping.id,topping.type 
101,donut,cake,0_55,1001,Regular,5001,None 
101,donut,cake,0_55,1002,Chocolate,5001,None 
101,donut,cake,0_55,1003,Blueberry,5001,None 
101,donut,cake,0_55,1004,Devil's Food,5001,None 
101,donut,cake,0_55,1001,Regular,5002,Glazed 
101,donut,cake,0_55,1002,Chocolate,5002,Glazed 
101,donut,cake,0_55,1003,Blueberry,5002,Glazed 
101,donut,cake,0_55,1004,Devil's Food,5002,Glazed 
101,donut,cake,0_55,1001,Regular,5003,Chocolate 
101,donut,cake,0_55,1002,Chocolate,5003,Chocolate 
101,donut,cake,0_55,1003,Blueberry,5003,Chocolate 
101,donut,cake,0_55,1004,Devil's Food,5003,Chocolate 

這樣設置可我需要覆蓋到嵌套這json結構像這樣。

{ 
    "id": "101", 
    "type": "donut", 
    "name": "Cake", 
    "ppu": 0.55, 
    "batter": 
     [ 
      { "id": "1001", "type": "Regular" }, 
      { "id": "1002", "type": "Chocolate" }, 
      { "id": "1003", "type": "Blueberry" }, 
      { "id": "1004", "type": "Devil's Food" } 
     ], 
    "topping": 
     [ 
      { "id": "5001", "type": "None" }, 
      { "id": "5002", "type": "Glazed" }, 
      { "id": "5003", "type": "Chocolate" } 
     ] 
} 

我們是否有可能在Dataframe聚合或自定義轉換中執行此操作,我必須編寫它。

在這裏找到了類似的問題 Writing nested JSON in spark scala 但是沒有相當正確的答案。

回答

0

所以,顯然沒有直接的方式通過數據幀API來完成這項任務。你可以使用

df.toJson.{..} 

但它不會給你你想要的輸出。

你必須寫一個混亂的變換,我很想聽聽任何其他可能的解決方案。我假設你的結果適合內存,因爲它必須返回給驅動程序。另外,我在這裏使用Gson API來創建json。

def arrToJson(arr: Array[Row]): JsonObject = { 
    val jo = new JsonObject 
    arr.map(row => ((row(0) + "," + row(1) + "," + row(2) + "," + row(3)), 
     (row(4) + "," + row(5) + "," + row(6) + "," + row(7)))) 
     .groupBy(_._1).map(f => (f._1.split(","), f._2.map(_._2.split(",")))) 
     .foreach { x => 
     { 

      jo.addProperty("id", x._1(0)) 
      jo.addProperty("type", x._1(1)) 
      jo.addProperty("name", x._1(2)) 
      jo.addProperty("ppu", x._1(3)) 

      val bja = new JsonArray 
      val tja = new JsonArray 
      x._2.foreach(f => { 
      val bjo = new JsonObject 
      val tjo = new JsonObject 

      bjo.addProperty("id", f(0)) 
      bjo.addProperty("type", f(1)) 

      tjo.addProperty("id", f(2)) 
      tjo.addProperty("type", f(3)) 

      bja.add(bjo) 
      tja.add(tjo) 
      }) 
      jo.add("batter", bja) 
      jo.add("topping", tja) 

     } 
     } 

    jo 
    } 
+0

謝謝Chitral,但不幸的是,在我的情況下,對於100列和至少2層嵌套,這可能會變得很複雜。另外在我的情況下,數據可能不適合驅動程序,所以我期待在分佈式節點中進行轉換。我在某種程度上可以使用UDAF,但只能用於只有一個嵌套對象的簡單對象。我聽說Spark 2.0對此有更好的支持。但不確定。再次感謝您的努力:) – BRKumaran