下面是獲得罕見行的方式在兩個數據幀之間:
val d1 = Seq((3, "Chennai", "rahman", "9848022330", 45000, "SanRamon"), (1, "Hyderabad", "ram", "9848022338", 50000, "SF"), (2, "Hyderabad", "robin", "9848022339", 40000, "LA"), (4, "sanjose", "romin", "9848022331", 45123, "SanRamon"))
val d2 = Seq((3, "Chennai", "rahman", "9848022330", 45000, "SanRamon"), (1, "Hyderabad", "ram", "9848022338", 50000, "SF"), (2, "Hyderabad", "robin", "9848022339", 40000, "LA"), (4, "sanjose", "romin", "9848022331", 45123, "SanRamon"), (4, "sanjose", "romino", "9848022331", 45123, "SanRamon"), (5, "LA", "Test", "1234567890", 12345, "Testuser"))
val df1 = d1.toDF("emp_id" ,"emp_city" ,"emp_name" ,"emp_phone" ,"emp_sal" ,"emp_site")
val df2 = d2.toDF("emp_id" ,"emp_city" ,"emp_name" ,"emp_phone" ,"emp_sal" ,"emp_site")
spark.sql("((select * from df1) union (select * from df2)) minus ((select * from df1) intersect (select * from df2))").show //spark is SparkSession
您能否讓我知道如何聲明df1和df2。我已經聲明如下 sqlContext = SQLContext(sc) df = sqlContext.sql(「select * from table1」) df2 = sqlContext.sql(「select * from table2」)then coped the above code as is .. ..獲取語法錯誤.... IAM非常新的火花斯卡拉代碼 –
你能糾正我我做錯了什麼,當我嘗試運行下面的代碼時我得到一個錯誤:未找到:值df1,未找到df2 .. 進口org.apache.spark.sql {數據幀,sQLContext} 進口org.apache.spark.sql.functions._ VAL SC:SparkContext VAL sqlContext =新org.apache.spark.sql.SQLContext (sc) sqlContext = SQLContext(sc) df1 = sqlContext.sql(「select * from表1 「) DF2 = sqlContext.sql(」 從表2中選擇* 「) DIFF(」 租戶」,DF1,DF2) DEF的diff(鍵:字符串,DF1:數據幀,DF2:數據幀):數據幀= { ......} ///提供有趣的代碼 –
嗨,我添加了一個簡短的例子。 –