2017-03-09 58 views
-1

我是scala新手。示例數據:比較火花地圖中的當前記錄和所有下一個值

1,"jack",34.5 
2,"jackk",14.5 
3,"jacky",24.5 
4,"jack",64.5 
And many more. 

我想比較第一個記錄的每個字段與其他所有字段,然後第二個與所有其他字段等。 (請不要考慮Syntaxs) 我已經寫了下面的代碼:

val data = sc.parallalize(Seq((1,"jack",34.5), 
     (2,"jackk",14.5), 
     (3,"jacky",24.5), 
     (4,"jack",64.5)) 

    val res = data.map{f => 
      val rr = f._1.equals(f._1) //here same field compare with each other But I want to compare current record with all next records. 
      Row(rr) 
     } 

例子:

"jack" with "jackk" 
"jack" with "jacky" 
"jack" with "jack" 
"jackk" with "jacky" 
"jackk" with "jack" 
"jacky" with "jack" 

我使用.map因爲我想代碼應該在集羣上執行。

請給點建議。 在此先感謝。

+0

你可以考慮模式匹配嗎? – 2017-03-09 11:39:17

+0

請不要考慮模式匹配。 –

回答

0

嘗試這樣的:

data.cartesian(data).map(pair => compare(pair._1, pair._2)) 

但要知道, '笛卡爾' 操作需要N * N的空間。

相關問題