星火2.1.0，做第二次當無法解析列名加入

我有三個表，並且都在一個表項，所以我做了一個 & B = d聯接星火2.1.0，做第二次當無法解析列名加入

現在我要完成用的連接與d &將C

問題是我得到這個錯誤：

org.apache.spark.sql.AnalysisException: Cannot resolve column name "ClaimKey" among (_1, _2); 
    at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:219)

這是實際的代碼，從飛艇：

joinedperson.printSchema 
filteredtable.printSchema 
val joined = joinedperson.joinWith(filteredtable, 
    filteredtable.col("ClaimKey") === joinedperson.col("ClaimKey"))

這些是我嘗試加入的兩個表的模式，問題與第一個模式中的ClaimKey有關。

root 
|-- _1: struct (nullable = false) 
| |-- clientID: string (nullable = true) 
| |-- PersonKey: string (nullable = true) 
| |-- ClaimKey: string (nullable = true) 
|-- _2: struct (nullable = false) 
| |-- ClientID: string (nullable = true) 
| |-- MyPersonKey: string (nullable = true) 
root 
|-- clientID: string (nullable = true) 
|-- ClaimType: string (nullable = true) 
|-- ClaimKey: string (nullable = true)

我讀了從地板文件中的原始數據，然後我用case類的行映射到類，有數據集。

我期望它是由於元組，所以我怎麼做這個連接？

來源

2017-04-18 James Black

您的第一數據幀的結構是嵌套 - ClaimKey是另一字段（_1）內的字段;要訪問這樣的領域，您可以簡單地給與該領域的「路線」與父領域由點分開：

val joined = joinedperson.joinWith(filteredtable, 
    filteredtable.col("ClaimKey") === joinedperson.col("_1.ClaimKey"))

來源

2017-04-18 17:40:36

非常感謝。這種簡單的解決方案。 –

星火2.1.0，做第二次當無法解析列名加入

回答

相關問題