我正在嘗試使用Spark協作過濾來實現推薦系統。Spark ML-未能使用MatrixFactorizationModel加載模型
首先我準備模型和保存到磁盤:
MatrixFactorizationModel model = trainModel(inputDataRdd);
model.save(jsc.sc(), "/op/tc/model/");
當我使用單獨的過程中,程序失敗,並以下例外加載模型:
代碼:
static JavaSparkContext jsc ;
private static Options options;
static{
SparkConf conf = new SparkConf().setAppName("TC recommender application");
conf.set("spark.driver.allowMultipleContexts", "true");
jsc= new JavaSparkContext(conf);
}
MatrixFactorizationModel model = MatrixFactorizationModel.load(jsc.sc(),
"/op/tc/model/");
例外:
線程「main」異常java.io.IOException:不是文件: maprfs:/ op/tc/model/data at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:324) 在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) 在org.apache.spark.rdd.RDD $$ anonfun $分區$ 2.適用(RDD.scala:239) 在org.apache在scala.Option.getOrElse(Option.scala:120) 在org.apache.spark.rdd.RDD.partitions(RDD:.spark.rdd.RDD $$ anonfun $ $分區2.適用(237 RDD.scala) .scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239 ) 在org.apache.spark.rdd.RDD $$ anonfun $分區$ 2.適用(RDD.scala:237) 在scala.Option.getOrElse(Option.scala:120) 在org.apache.spark.rdd。 RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply( RDD.scala:239) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org。 apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD $$ anonfun $分區$ 2.適用(RDD.scala:237) 在scala.Option.getOrElse(Option.scala:120) 在org.apache.spark.rdd.RDD。分區(RDD.scala:237) 在org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) 在org.apache.spark.rdd.RDD $$ anonfun $總$ 1.適用(RDD.scala: 1114) at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:111) at org.apache。 spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.aggregate(RDD.scala:1107) at org.apache.spark.mllib.recommendation.MatrixFactorizationModel.countApproxDistinctUserProduct( MatrixFactorizationModel .scala:96) 在org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict(MatrixFactorizationModel.scala:126) 在com.aexp.cxp.recommendation.ProductRecommendationIndividual.main(ProductRecommendationIndividual.java:62) 在太陽.reflect.NativeMethodAccessorImpl.invoke0(本機方法) 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect中。 Method.invoke(Method.java:497) at org.apache.spark。deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:742) at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) at org。 apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit。主要(SparkSubmit.scala)
是否有任何配置我需要設置爲加載模型?任何建議都會很有幫助。
我認爲這很清楚。你的文件不存在(至少,它不存在於奴隸,因爲我們可以看到它正在做一個映射操作) – Dici
如果我加載模型在我保存它的同一個進程。它不會抱怨:( –
)在一個側面節點上,我不推薦使用'allowMultipleContexts'。我從來沒有在Spark配置中看到過它,這意味着它仍然沒有足夠的支持來將它暴露給 – Dici