2017-06-14 66 views
0

我有一個數據框,它是兩個其他數據框的連接。 我想執行一個SQL查詢運行,但我不知道如何區分ID列。 我試着指定原始表,但沒有運氣。Spark SQL通過運行SQL選擇不明確的列

架構

博客:

root 
|-- id: integer (nullable = false) 
|-- author: string (nullable = true) 
|-- title: string (nullable = true) 

評論:

root 
|-- id: integer (nullable = false) 
|-- blog_id: integer (nullable = false) 
|-- author: string (nullable = true) 
|-- comment: string (nullable = true) 

博客加入了與評論

root 
|-- id: integer (nullable = true) 
|-- author: string (nullable = true) 
|-- title: string (nullable = true) 
|-- id: integer (nullable = true) 
|-- blog_id: integer (nullable = true) 
|-- author: string (nullable = true) 
|-- comment: string (nullable = true) 

嘗試查詢

scala> spark.sql("SELECT id FROM joined") 
12:17:26.981 [run-main-0] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: SELECT id FROM joined 
org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could be: id#7, id#23.; line 1 pos 7 

scala> spark.sql("SELECT blogs.id FROM joined") 
org.apache.spark.sql.AnalysisException: cannot resolve '`blogs.id`' given input columns: [blog_id, id, comment, title, author, author, id]; line 1 pos 7; 
'Project ['blogs.id] 
+- SubqueryAlias joined, `joined` 
    +- Join FullOuter, (id#7 = blog_id#24) 
     :- Project [_1#0 AS id#7, _2#1 AS author#8, _3#2 AS title#9] 
     : +- LocalRelation [_1#0, _2#1, _3#2] 
     +- Project [_1#14 AS id#23, _2#15 AS blog_id#24, _3#16 AS author#25, _4#17 AS comment#26] 
     +- LocalRelation [_1#14, _2#15, _3#16, _4#17] 

回答

-2

您的查詢一個錯字。

spark.sql("SELECT blogs.id FROM joined") 

應該

spark.sql("SELECT blog.id FROM joined") 
+0

不,沒有錯別字..爲什麼它會成爲博客而不是博客? – aclokay

0

你可能已經加入瞭如下兩個dataframes:

val df = left.join(right, left.col("name") === right.col("name")) 

凡加入由上name列 - 此列在joined- deplicated DF。

解決這個:(指定連接列)

val df = left.join(right, Seq("name")) 

這樣一來,就可以刪除加入DF重複列;和查詢沒有任何問題。

+0

連接不在有問題的列上。這是id = blog_id – aclokay

+0

哦!然後,您需要在加入時指定別名以避免模糊 –

+0

我在尋找一種解決方案,不需要我明確指定別名。我的意思是,如果我在SQL級別運行查詢,提到該列所屬的表是否有效。我期望在Spark SQL級別上做同樣的事情。 – aclokay