2017-03-06 66 views
3

火花SQL,我可以用

val spark = SparkSession 
     .builder() 
     .appName("SparkSessionZipsExample") 
     .master("local") 
     .config("spark.sql.warehouse.dir", "warehouseLocation-value") 
     .getOrCreate() 

val df = spark.read.json("source/myRecords.json") 
df.createOrReplaceTempView("shipment") 
val sqlDF = spark.sql("SELECT * FROM shipment") 

從「myRecords.json」獲得的數據,該JSON文件的結構是:

df.printSchema() 
root 
|-- _id: struct (nullable = true) 
| |-- $oid: string (nullable = true) 
|-- container: struct (nullable = true) 
| |-- barcode: string (nullable = true) 
| |-- code: string (nullable = true) 

我能得到的特定列這個JSON如:

val sqlDF = spark.sql("SELECT container.barcode, container.code FROM shipment") 

但是我怎麼能從這個json文件得到id。$ oid? 我試過"SELECT id.$oid FROM shipment_log""SELECT id.\$oid FROM shipment_log",但根本不工作。 錯誤消息:

error: invalid escape character 

任何一個可以告訴我,我怎樣才能得到id.$oid

回答

5

反引號是你的朋友:

spark.read.json(sc.parallelize(Seq(
    """{"_id": {"$oid": "foo"}}""") 
)).createOrReplaceTempView("df") 

spark.sql("SELECT _id.`$oid` FROM df").show 
+----+ 
|$oid| 
+----+ 
| foo| 
+----+ 

DataFrame API:

spark.table("df").select($"_id".getItem("$oid")).show 
+--------+ 
|_id.$oid| 
+--------+ 
|  foo| 
+--------+ 

spark.table("df").select($"_id.$$oid") 
+--------+ 
|_id.$oid| 
+--------+ 
|  foo| 
+--------+