2016-04-03 39 views
0

將隨機森林模型保存到磁盤時,我收到以下error。 火花集羣配置 - 火花包 - spark-1.6.0-bin-hadoop2.6 模式 - 獨立在火花簇中保存隨機森林模型時發生錯誤scala

我在每個從機複製相同的數據運行的火花

command - localModel.save(SlapSparkContext.get(), path) 模型已被訓練並正確預測測試數據

error trace

顯示java.lang.NullPointerException at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456) at org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420) at org.apache.parquet.hadoop。 ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer。 scala:230) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation $$ anonfun $ run $ 1.apply $ mcV $ sp(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.sql.execution。 datasources.InsertIntoHadoopFsRelation $$ anonfun $ run $ 1.apply(InsertIntoHadoopFsRelation.scala:108) 在org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation $$ anonfun $運行$ 1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala: 56) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult $ lzycompute(commands.scala:58)org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) 在org.apache的org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) 上的org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) 上的 。 spark.sql.execution.SparkPlan $$ anonfun $ execute $ 5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 5.apply(SparkPlan.s cala:130) at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org .apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark .sql.execution.datasources.ResolvedDataSource $ .apply(ResolvedDataSource.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrameWriter。保存(DataFrameWriter.scala:139) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329) at org.apache.spark.mllib.tree.model.TreeEnsem bleModel $ SaveLoadV1_0 $ .save(treeEnsembleModels.scala:453) 在org.apache.spark.mllib.tree.model.RandomForestModel.save(treeEnsembleModels.scala:65)

+0

如果你用localModel.count替換localModel.save(...),你有同樣的錯誤嗎? – eliasah

+0

eliasah - 我無法找到任何方法名稱計數。我使用spark-mllib_2.10 -version 1.6 –

+0

然後在運行算法之前對RF算法的數據源進行計數 – eliasah

回答

0

的錯誤,當你試圖來保存 DataFrame。檢查此行代碼之前的步驟是否正在過濾/減少您的記錄。