2017-04-10 18 views
0

鑑於dataframes df_adf_b,我怎麼能達到同樣的效果左排除加盟:星火SQL 1.5.2:左排除加入

SELECT df_a.* 
FROM df_a 
    LEFT JOIN df_b 
    ON df_a.id = df_b.id 
WHERE df_b.id is NULL 

我已經試過:

df_a.join(df_b, df_a("id")===df_b("id"), "left") 
    .select($"df_a.*") 
    .where(df_b.col("id").isNull) 

我從上面得到一個例外:

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit() 

回答

0

你可以嘗試執行SQL查詢本身 - 保留荷蘭國際集團簡單..

df_a.registerTempTable("TableA") 
df_b.registerTempTable("TableB") 
result = sqlContext.sql("SELECT * FROM TableA A \ 
          LEFT JOIN TableB B \ 
          ON A.id = B.id \ 
          WHERE B.id is NULL ") 
0

如果您希望通過dataframes做嘗試下面的例子:

import sqlContext.implicits._ 
    val df1 = sc.parallelize(List("a", "b", "c")).toDF("key1") 
    val df2 = sc.parallelize(List("a", "b")).toDF("key2") 

    import org.apache.spark.sql.functions._ 

    df1.join(df2, 
    df1.col("key1") <=> df2.col("key2"), 
    "left") 
    .filter(col("key2").isNull) 
    .show 

你會得到輸出:

+----+----+ 
|key1|key2| 
+----+----+ 
| c|null| 
+----+----+