2015-11-06 26 views
1

我有以下三個班,我得到org.apache.spark.SparkException:任務不序列化 - 傳遞RDD

Task not serialized

錯誤。完整的堆棧跟蹤見下文。

第一類是序列化的人:

public class Person implements Serializable 
    { 
     private String name; 
     private int age; 

     public String getName() 
     { 
     return name; 
     } 

     public void setAge(int age) 
     { 
     this.age = age; 
     } 
    } 

該類從文本文件和地圖的人類讀取:

public class sqlTestv2 implements Serializable 

{ 

    private int appID; 
    private int dataID; 
    private JavaSparkContext sc; 



    public JavaRDD<Person> getDataRDD() 
    { 

    JavaRDD<String> test = sc.textFile("hdfs://localhost:8020/user/cloudera/people.txt"); 

     JavaRDD<Person> people = test.map(
        new Function<String, Person>() { 
        public Person call(String line) throws Exception { 
         String[] parts = line.split(","); 

         Person person = new Person(); 
         person.setName(parts[0]); 
         person.setAge(Integer.parseInt(parts[1].trim())); 

         return person; 
        } 
        }); 


     return people; 



    } 

} 

這檢索RDD並在其上進行操作:

public class sqlTestv1 implements Serializable 

{ 

    public static void main(String[] arg) throws Exception 
    { 
      SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount"); 
      JavaSparkContext sc = new JavaSparkContext(conf); 
      SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); 
      sqlTestv2 v2=new sqlTestv2(1,1,sc); 
      JavaRDD<Person> test=v2.getDataRDD(); 


      DataFrame schemaPeople = sqlContext.createDataFrame(test, Person.class); 
      schemaPeople.registerTempTable("people"); 

      DataFrame df = sqlContext.sql("SELECT age FROM people"); 

      df.show(); 





    } 

} 

完整的錯誤:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:294) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:293) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.map(RDD.scala:293) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:90) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47) at com.oreilly.learningsparkexamples.mini.java.sqlTestv2.getDataRDD(sqlTestv2.java:54) at com.oreilly.learningsparkexamples.mini.java.sqlTestv1.main(sqlTestv1.java:41) Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext Serialization stack: - object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: [email protected]) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2, name: sc, type: class org.apache.spark.api.java.JavaSparkContext) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2, [email protected]) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, name: this$0, type: class com.oreilly.learningsparkexamples.mini.java.sqlTestv2) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, [email protected]) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)

回答

3

堆棧告訴你答案。這是你要傳給sqlTestv2JavaSparkContext。您應該將sc傳入方法中,而不是類

1

您可以將「transient」修飾符添加到sc,以便它不被序列化。

相關問題