2016-05-01 168 views
4

我無法與階火花Scala的理解reduceByKey(_ + _)

object WordCount { 
def main(args: Array[String]): Unit = { 
val inputPath = args(0) 
val outputPath = args(1) 
val sc = new SparkContext() 
val lines = sc.textFile(inputPath) 
val wordCounts = lines.flatMap {line => line.split(" ")} 
.map(word => (word, 1)) 
.reduceByKey(_ + _) **I cant't understand this line** 
wordCounts.saveAsTextFile(outputPath) 
} 
} 

回答

9

降低火花的第一示例理解reduceByKey(_ + _)採用兩個元件和應用函數後產生第三到兩個參數。

您顯示的代碼等同於以下

reduceByKey((x,y)=> x + y) 

而不是定義虛擬變量,寫一個lambda的,Scala是足夠聰明弄清楚你想要什麼實現的是應用func(總和上的任何兩個參數它接收這種情況下),因此

reduceByKey(_ + _) 
+1

非常感謝 – Elsayed

0

reduceByKey採用兩個參數的語法,應用一個函數,並返回

reduceByKey(_ + _)相當於reduceByKey((X,Y)=> X + Y)

實施例:

val numbers = Array(1, 2, 3, 4, 5) 
val sum = numbers.reduceLeft[Int](_+_) 

println("The sum of the numbers one through five is " + sum) 

結果:

The sum of the numbers one through five is 15 
numbers: Array[Int] = Array(1, 2, 3, 4, 5) 
sum: Int = 15 

相同reduceByKey(_ + + _)相當於reduceByKey((x,y)=> x ++ y)