我定義了兩個表是這樣的:Left反加入Spark?
val tableName = "table1"
val tableName2 = "table2"
val format = new SimpleDateFormat("yyyy-MM-dd")
val data = List(
List("mike", 26, true),
List("susan", 26, false),
List("john", 33, true)
)
val data2 = List(
List("mike", "grade1", 45, "baseball", new java.sql.Date(format.parse("1957-12-10").getTime)),
List("john", "grade2", 33, "soccer", new java.sql.Date(format.parse("1978-06-07").getTime)),
List("john", "grade2", 32, "golf", new java.sql.Date(format.parse("1978-06-07").getTime)),
List("mike", "grade2", 26, "basketball", new java.sql.Date(format.parse("1978-06-07").getTime)),
List("lena", "grade2", 23, "baseball", new java.sql.Date(format.parse("1978-06-07").getTime))
)
val rdd = sparkContext.parallelize(data).map(Row.fromSeq(_))
val rdd2 = sparkContext.parallelize(data2).map(Row.fromSeq(_))
val schema = StructType(Array(
StructField("name", StringType, true),
StructField("age", IntegerType, true),
StructField("isBoy", BooleanType, false)
))
val schema2 = StructType(Array(
StructField("name", StringType, true),
StructField("grade", StringType, true),
StructField("howold", IntegerType, true),
StructField("hobby", StringType, true),
StructField("birthday", DateType, false)
))
val df = sqlContext.createDataFrame(rdd, schema)
val df2 = sqlContext.createDataFrame(rdd2, schema2)
df.createOrReplaceTempView(tableName)
df2.createOrReplaceTempView(tableName2)
我試圖建立查詢到從沒有匹配的行表2表1返回行。 我嘗試使用此查詢做到這一點:
Select * from table1 LEFT JOIN table2 ON table1.name = table2.name AND table1.age = table2.howold AND table2.name IS NULL AND table2.howold IS NULL
但這只是給了我從表1中的所有行:
列表({「名」:「約翰」,「年齡」: 33,isBoy:true}, {「name」:「susan」,「age」:26,「isBoy」:false}, {「name」:「mike」,「age」:26,「isBoy 「:true})
如何在Spark中有效地進行這種連接?
我正在查找SQL查詢,因爲我需要能夠指定要在兩個表之間進行比較的列,而不僅僅是比較像其他推薦問題那樣逐行比較。像使用減法,除了等。
的可能的複製[火花:減去兩個DataFrames](http://stackoverflow.com/questions/29537564/spark-subtract-two-dataframes) –
根據您的編輯和評論我的答案,我認爲你正在尋找: http://stackoverflow.com/questions/29537564/spark-subtract-two-dataframes值得注意的是@Interfector對第一個回答的評論 –