運行Windows 8.1,Java的1.8,斯卡拉2.10.5,星火1.4.1,斯卡拉IDE(Eclipse的4.4),IPython的3.0.0和Jupyter Scala。故障排除斯卡拉星火配置/環境
我是比較新的Scala和Spark和我看到這裏一定RDD命令,如收集和先返回「任務不序列化」錯誤的問題。什麼是不尋常的對我來說是我看到的錯誤在IPython的筆記本電腦與斯卡拉內核或斯卡拉IDE。但是,當我直接在spark-shell中運行代碼時,我不會收到此錯誤。
我想設置爲超過殼更先進的編碼評價這兩種環境。我幾乎沒有什麼專業知識來解決這類問題並確定要尋找什麼;如果您可以提供有關如何開始解決此類問題的指導,我們將不勝感激。
代碼:
val logFile = "s3n://[key:[key secret]@mortar-example-data/airline-data"
val sample = sc.parallelize(sc.textFile(logFile).take(100).map(line => line.replace("'","").replace("\"","")).map(line => line.substring(0,line.length()-1)))
val header = sample.first
val data = sample.filter(_!= header)
data.take(1)
data.count
data.collect
堆棧跟蹤
org.apache.spark.SparkException: Task not serializable
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
org.apache.spark.rdd.RDD.filter(RDD.scala:310)
cmd49$$user$$anonfun$4.apply(Main.scala:188)
cmd49$$user$$anonfun$4.apply(Main.scala:187)
java.io.NotSerializableException: org.apache.spark.SparkConf
Serialization stack:
- object not serializable (class: org.apache.spark.SparkConf, value: [email protected])
- field (class: cmd12$$user, name: conf, type: class org.apache.spark.SparkConf)
- object (class cmd12$$user, [email protected])
- field (class: cmd49, name: $ref$cmd12, type: class cmd12$$user)
- object (class cmd49, [email protected])
- field (class: cmd49$$user, name: $outer, type: class cmd49)
- object (class cmd49$$user, [email protected])
- field (class: cmd49$$user$$anonfun$4, name: $outer, type: class cmd49$$user)
- object (class cmd49$$user$$anonfun$4, <function0>)
- field (class: cmd49$$user$$anonfun$4$$anonfun$apply$3, name: $outer, type: class cmd49$$user$$anonfun$4)
- object (class cmd49$$user$$anonfun$4$$anonfun$apply$3, <function1>)
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
org.apache.spark.rdd.RDD.filter(RDD.scala:310)
cmd49$$user$$anonfun$4.apply(Main.scala:188)
cmd49$$user$$anonfun$4.apply(Main.scala:187)
你爲什麼要使用sc.parallelize內sc.textFile?!? – eliasah