我有一個數據框,它是兩個其他數據框的連接。 我想執行一個SQL查詢運行,但我不知道如何區分ID列。 我試着指定原始表,但沒有運氣。Spark SQL通過運行SQL選擇不明確的列
架構:
博客:
root
|-- id: integer (nullable = false)
|-- author: string (nullable = true)
|-- title: string (nullable = true)
評論:
root
|-- id: integer (nullable = false)
|-- blog_id: integer (nullable = false)
|-- author: string (nullable = true)
|-- comment: string (nullable = true)
博客加入了與評論
root
|-- id: integer (nullable = true)
|-- author: string (nullable = true)
|-- title: string (nullable = true)
|-- id: integer (nullable = true)
|-- blog_id: integer (nullable = true)
|-- author: string (nullable = true)
|-- comment: string (nullable = true)
嘗試查詢:
scala> spark.sql("SELECT id FROM joined")
12:17:26.981 [run-main-0] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: SELECT id FROM joined
org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could be: id#7, id#23.; line 1 pos 7
scala> spark.sql("SELECT blogs.id FROM joined")
org.apache.spark.sql.AnalysisException: cannot resolve '`blogs.id`' given input columns: [blog_id, id, comment, title, author, author, id]; line 1 pos 7;
'Project ['blogs.id]
+- SubqueryAlias joined, `joined`
+- Join FullOuter, (id#7 = blog_id#24)
:- Project [_1#0 AS id#7, _2#1 AS author#8, _3#2 AS title#9]
: +- LocalRelation [_1#0, _2#1, _3#2]
+- Project [_1#14 AS id#23, _2#15 AS blog_id#24, _3#16 AS author#25, _4#17 AS comment#26]
+- LocalRelation [_1#14, _2#15, _3#16, _4#17]
不,沒有錯別字..爲什麼它會成爲博客而不是博客? – aclokay