2016-09-01 50 views
2

如果我通過JSONArray應用flatMap到JSONObject如果我從eclipse運行在本地(筆記本電腦)上,它運行良好,但是在集羣(YARN)上運行時,它給出了奇怪的錯誤。 星火版本2.0.0Spark flatmap給迭代器錯誤

代碼: -

JavaRDD<JSONObject> rdd7 = rdd6.flatMap(new FlatMapFunction<JSONArray, JSONObject>(){ 
    @Override 
    public Iterable<JSONObject> call(JSONArray array) throws Exception { 
     List<JSONObject> list = new ArrayList<JSONObject>(); 
     for (int i = 0; i < array.length();list.add(array.getJSONObject(i++)));    
     return list; 
    } 
}); 

錯誤日誌: -

java.lang.AbstractMethodError: com.pwc.spark.tifcretrolookup.TIFCRetroJob$2.call(Ljava/lang/Object;)Ljava/util/Iterator; 
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) 
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) 
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) 
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) 
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) 
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) 
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) 
    at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) 
    at com.pwc.spark.ElasticsearchClientLib.CommonESClient.index(CommonESClient.java:33) 
    at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:34) 
    at com.pwc.spark.ElasticsearchClientLib.ESClient.call(ESClient.java:15) 
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) 
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218) 
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) 
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883) 
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) 
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
    at org.apache.spark.scheduler.Task.run(Task.scala:85) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
+0

您是否安裝了Scala 2.11? –

回答

5

由於星火2.0.0,一個flatMap裏面調用該函數必須返回一個Iterator代替Iterable,如發佈說明狀態:

Java RDD’s flatMap and mapPartitions functions used to require functions returning Java Iterable. They have been updated to require functions returning Java iterator so the functions do not need to materialize all the data.

這裏是the relevant Jira issue

+0

這在2.1.0中似乎不起作用。這個變化是否在小版本的某處發生了變化?我的classpath上有spark-core_2.11:2.1.0,當我嘗試這個時,我得到錯誤:(31,25)類org.apache.spark.rdd.RDD中的java:method flatMap不能應用於給定的類型; 需要:scala.Function1 >,scala.reflect.ClassTag 實測值:(X) - >數組[...] TOR() 原因:不能推斷類型-variable(s)U (實際和正式的參數列表長度不同) – bearrito

+0

@ bearrito你可以發佈相關的代碼部分嗎? –