2016-11-15 52 views
0

我有一個JSON它看起來像這樣星火的Sql平展的Json

{"name":"Michael", "cities":["palo alto", "menlo park"], "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley","year":2012}]} 

我要輸出存儲在一個CSV文件是這樣的:

Michael,{"sname":"stanford", "year":2010} 

Michael,{"sname":"berkeley", "year":2012} 

我曾嘗試以下:

val people = sqlContext.read.json("people.json") 
val flattened = people.select($"name", explode($"schools").as("schools_flat")) 

上面的代碼並沒有給schools_flat作爲json。 關於如何獲得預期輸出的任何ide。

謝謝

回答

0

您需要明確指定模式以所需的方式讀取json文件。 在這種情況下,它會是這樣:

import org.apache.spark.sql.catalyst.ScalaReflection 
import org.apache.spark.sql.types.StructType 

case class json_schema_class( cities: String, name : String, schools: Array[String]) 
var json_schema = ScalaReflection.schemaFor[json_schema_class].dataType.asInstanceOf[StructType] 

var people = sqlContext.read.schema(json_schema).json("people.json") 
var flattened = people.select($"name", explode($"schools").as("schools_flat")) 

在 '扁平' 數據幀是這樣的:

+-------+--------------------+ 
| name|  schools_flat| 
+-------+--------------------+ 
|Michael|{"sname":"stanfor...| 
|Michael|{"sname":"berkele...| 
+-------+--------------------+