我試圖將JSON文件轉換爲CSV拼合,但我沒有在這方面取得成功。我試圖讓這個輸出Spark 2.0 - 將JSON文件拼合爲CSV
我試試這個代碼,但我不知道如何正確manipule火花SQL的qualify
列,並返回正確的值。
from pyspark.sql.functions import *
dummy = spark.read.json('dummy-3.json')
qualify = dummy.select("user_id", "rec_id", "uut", "hash", explode("qualify").alias("qualify"))
qualify.show()
+-------+------+---+------+--------------------+
|user_id|rec_id|uut| hash| qualify|
+-------+------+---+------+--------------------+
| 1| 2| 12|abc123|[cab321,test-1,of...|
| 1| 2| 12|abc123|[cab123,test-2,of...|
+-------+------+---+------+--------------------+
JSON例如:
{
"user_id": 1,
"rec_id": 2,
"uut": 12,
"hash": "abc123"
"qualify":[{
"offer": "offer-1",
"name": "test-1",
"hash": "cab321",
"qualified": false"
"rules": [{
"name": "name of rule 1",
"approved": true,
"details": {}
},
{
"name": "name of rule 2",
"approved": false,
"details": {}
}]
},{
"offer": "offer-2",
"name": "test-2",
"hash": "cab123",
"qualified": true
"rules": [{
"name": "name of rule 1",
"approved": true,
"details": {}
},
{
"name": "name of rule 2",
"approved": false,
"details": {}
}]
}
}
JSON模式:
root
|-- hash: string (nullable = true)
|-- qualify: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- hash: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- offer: string (nullable = true)
| | |-- qualified: boolean (nullable = true)
| | |-- rules: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- approved: boolean (nullable = true)
| | | | |-- name: string (nullable = true)
|-- rec_id: long (nullable = true)
|-- user_id: long (nullable = true)
|-- uut: long (nullable = true)
我tryed變換數據框爲RDD並創建一個映射函數返回值,但出於某種原因,我認爲這不是一個好方法。我錯了?
有沒有人在類似的問題工作?
感謝您的任何幫助。
您是否試圖將'qualified。*'放入您的選擇查詢中而不是'explode'? – Zyoma