2017-09-25 95 views
0

主鍵找到兩個數據幀之間的差異我在火花兩個數據幀。 我做df1.except(df2) 2查找是否列有兩個數據幀之間的變化。基於在火花階

DF1是喜歡這裏

|001000900|aaaaa BELLOWS CORPORATION||N| 
|001000905|ddddd DEPARTMENT OF LABOR AND EMPLOYMENT SECURITY|BUREAU OF COMPLIANCE|N| 
|001001049|gggg RAVIOLI MFG CO INC|SPINELLI BKY RAVIOLI PASTRY SP|N| 
|001001130|dddd ANGELES UNIFIED SCHOOL DISTRICT|TRANSPORTATION BRANCH|N| 
|001001143|ffff MUSIC PARTIES, INC||N| 
|001001155|BOSTON BRASS AND IRON CO||N| 
|001001171|HANCOCK MARINE, INC.||N| 
|001001184|TRILLION CORPORATION||N| 
|001001192|HAWAII STATE CHIROPRACTIC ASSOCIATION INC||N| 
|001001379|THE FRUIT SQUARE PEOPLE INC|L & M BAKERY|N| 
|001001416|J & S MARKET||N| 

DF2是像下面

|001000145|PARADISE TAN||N| 
|001000306|SHRUT & ASCH LEATHER COMPANY, INC.||N| 
|001000355|HARRISON SPECIALTY CO., INC.||N| 
|001000363|LOUIS M. GERSON CO., INC.||N| 
|001000467|SAVE THE SEA TURTLES INTERNATIONAL|ADOPT THE BEACH HI|N| 
|001000504|DIRIGO SPICE CORPORATION|CUNNINGHAM SPICE|N| 
|001000744|FREEDMAN THREAD COMPANY|COLONIAL THREAD CO|N| 
|001000756|AFFORDABLE AIR CONDITIONING|P R ENTERPRISE|N| 
|001000900|CLIFLEX BELLOWS CORPORATION||N| 
|001000905|FLORIDA DEPARTMENT OF LABOR AND EMPLOYMENT SECURITY|BUREAU OF COMPLIANCE|N| 
|001001049|SPINELLI RAVIOLI MFG CO INC|SPINELLI BKY RAVIOLI PASTRY SP|N| 
|001001130|LOS ANGELES UNIFIED SCHOOL DISTRICT|TRANSPORTATION BRANCH|N| 
|001001143|TOSCO MUSIC PARTIES, INC||N| 
|001001155|BOSTON BRASS AND IRON CO||N| 

但我想的是,我必須找到基於一個塔。有些東西就像兩個數據幀之間的差異下面

我想我的輸出如下面

|dunsnumber|filler1|  businessname|  tradestylename|registeredaddressindicator| 
+----------+-------+--------------------+--------------------+--------------------------+ 
| 001001130|  |dddd ANGELES UNIF...|TRANSPORTATION BR...|       N| 
| 001000900|  |aaaaa BELLOWS COR...|     |       N| 
| 001000905|  |ddddd DEPARTMENT ...|BUREAU OF COMPLIANCE|       N| 
| 001001143|  |ffff MUSIC PARTIE...|     |       N| 
| 001001049|  |gggg RAVIOLI MFG ...|SPINELLI BKY RAVI...|       N| 
+----------+-------+--------------------+--------------------+ 

這裏是我的代碼

import org.apache.spark.sql.functions._ 
    val textRdd1 = sc.textFile("/home/cloudera/TRF/PCFP/INCR") 
    val rowRdd1 = textRdd1.map(line => Row.fromSeq(line.split("\\|", -1))) 
    var df1 = sqlContext.createDataFrame(rowRdd1, schema) 

    val textRdd2 = sc.textFile("/home/cloudera/TRF/PCFP/MAIN") 
    val rowRdd2 = textRdd2.map(line => Row.fromSeq(line.split("\\|", -1))) 
    var df2 = sqlContext.createDataFrame(rowRdd2, schema) 
    val diffAnyColumnDF = df1.except(df2).where(df1.col("dunsnumber") === 
    df2.col("dunsnumber")).show() 

所以,如果我的主鍵「dunsnumber」如果任何列已更改,或者不是該主鍵或不匹配,那麼只能找。

我希望清楚我的問題。

+0

你應該鍵連接它們,並使用過濾器或選擇和應用過濾邏輯。 :) –

+0

你想減去或簡單的除外?即你想生成的數據幀從DF1來只或DF1和DF2 –

+0

@Avishek如果主鍵是相同的,然後匹配的主鍵,如果任何屬於列值是不同的,我需要的是價值.. – SUDARSHAN

回答

0

嗨所以這也爲我工作..

val diffAnyColumnDF = df1.except(df2) 
val addDF= diffAnyColumnDF.join(df2, Seq("dunsnumber")).show() 
0

數據幀沒有方法。減去。不過,您可以使用其他方法。 將數據轉換爲RDD,使用減法方法,返回到您的數據框。