0
我有代碼來轉換行數據幀,但我有問題在數組中輸出。如何使用Spark數據幀將行數據幀轉換爲數組Json輸出
輸入:file.txt的
+-------------------------------+--------------------+-------+
|id |var |score |
+-------------------------------+--------------------+-------+
|12345 |A |8 |
|12345 |B |9 |
|12345 |C |7 |
|12345 |D |6 |
+-------------------------------+--------------------+-------+
輸出:
{"id":"12345","props":[{"var":"A","score":"8"},{"var":"B","score":"9"},{"var":"C","score":"7"},{"var":"D","score":"6"}]}
我嘗試使用collect_lis沒有成功。我的代碼是斯卡拉
val sc = new SparkContext(conf);
val sqlContext = new HiveContext(sc)
val df = sqlContext.read.json("file.txt")
val dfCol = df.select(
df("id"),
df("var"),
df("score"))
dfCol.show(false)
val merge = udf { (var: String, score: Double) =>
{
var + "," + score }
}
val grouped = dfCol.groupBy(col("id"))
.agg(collect_list(merge(col("var"),col("score")).alias("props"))
grouped.show(false)
我的問題是,如何將數據行轉換爲輸出數組json?
謝謝。
爲什麼你不嘗試按id編組DF,然後將DF寫入JSON文件本身?我希望應該返回作爲var和道具的數組。 – Shankar