我想打印文件中的內容,下面的代碼是我如何做到這一點。由於類型不匹配(單元和字符串),不能像這樣打印?
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
object SimpleSpark {
def main(arg: Array[String]) = {
val distFile = sc.textFile("https://stackoverflow.com/a/path/to/a/file")
val aClass: MyClass = new MyClass()
val mappedRDD = aClass.doStuff(distFile)
mappedRDD.reduce((a, b) => println(a))
// println(mappedRDD.reduce((a, b) => a + b + "\n"))
// mappedRDD.foreach(println)
}
class MyClass() {
def doStuff(rdd: RDD[String]) : RDD[String] = {
val field = "Hello!"
rdd.map(x => field + x)
}
}
我的問題是:
的代碼兩個註釋掉線工作正常,但是,這種mappedRDD.reduce((a, b) => println(a))
行收到錯誤,如:
[email protected]:Apache-Spark$ sbt package
[info] Set current project to Simple Project (in build file:/home/cliu/Documents/github/Apache-Spark/)
[info] Compiling 1 Scala source to /home/cliu/Documents/github/Apache-Spark/target/scala-2.10/classes...
[error] /home/cliu/Documents/github/Apache-Spark/src/main/scala/SimpleSpark.scala:72: type mismatch;
[error] found : Unit
[error] required: String
[error] mappedRDD.reduce((a, b) => println(a))
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Dec 2, 2015 5:18:24 PM
爲什麼mappedRDD.reduce((a, b) => println(a))
不工作?
爲什麼我打印的是Unit
而不是Sting
?
你的意思是'reduce'應該總是返回一些東西嗎? – fluency03
是的。函數'(a,b)=> a'要求'a'爲'String'類型。 'println(a)'是'Unit'類型,所以在那裏不起作用。 –