我有這個數據集的:加入對鍵值中對關鍵地圖
(apple,1)
(banana,4)
(orange,3)
(grape,2)
(watermelon,2)
,而其他數據集是:
(apple,Map(Bob -> 1))
(banana,Map(Chris -> 1))
(orange,Map(John -> 1))
(grape,Map(Smith -> 1))
(watermelon,Map(Phil -> 1))
我瞄準結合兩套得到:
(apple,1,Map(Bob -> 1))
(banana,4,Map(Chris -> 1))
(orange,3,Map(John -> 1))
(grape,2,Map(Smith -> 1))
(watermelon,2,Map(Phil -> 1))
的代碼我:
...
val counts_firstDataset = words.map(word =>
(word.firstWord, 1)).reduceByKey{case (x, y) => x + y}
第二個數據集:
...
val counts_secondDataset = secondSet.map(x => (x._1,
x._2.toList.groupBy(identity).mapValues(_.size)))
我試圖用join方法val joined_data = counts_firstDataset.join(counts_secondDataset)
但沒有奏效,因爲聯接需要對[ K,V]。我將如何解決這個問題?
@philantrovert RDDS –
明白了。我應該完全讀完這個問題。 – philantrovert
你用什麼數據結構來存儲這些數據集?列表,設置等? – fcat