2017-08-16 25 views
1

這不是一個錯誤,而是一個需要理解的問題。當我從Booster對象調用getModelDump時,我沒有得到像「num_round」參數那樣多的樹。我在考慮如果「num_round」是100,那麼XGBoost會按順序生成100棵樹,當我調用getModelDump時,我會看到所有這些樹。我相信背後有一個合乎邏輯的理由,或者我的知識是錯誤的。你能解釋一下這種情況嗎?XGBoost不會生成與num_round參數中指定的樹數量很多的樹

val paramMap = List(
     "eta" -> 0.1, "max_depth" -> 7, "objective" -> "binary:logistic", "num_round" ->100, 
     "eval_metric" -> "auc", "nworkers" -> 8).toMap 
    val xgboostEstimator = new XGBoostEstimator(paramMap) 
//TrainModel is another set of standard Spark features like StringIndexer, OnehotEncoding and VectorAssembler 
    val pipelineXGBoost = new Pipeline().setStages(Array(trainModel, xgboostEstimator)) 
    val cvModel = pipelineXGBoost.fit(train) 
//Below call generates only 2 tree instead of 100 as num_round is 100!!! 
    println(cvModel.stages(1).asInstanceOf[XGBoostClassificationModel].booster.getModelDump()(0)) 

Github的鏈接的問題https://github.com/dmlc/xgboost/issues/2610

版本如以下使用來自getModelDump的結果階2.11

"ml.dmlc" % "xgboost4j" % "0.7", 
    "ml.dmlc" % "xgboost4j-spark" % "0.7", 
    "org.apache.spark" %% "spark-core" % "2.2.0", 
    "org.apache.spark" %% "spark-sql" % "2.2.0", 
    "org.apache.spark" %% "spark-graphx" % "2.2.0", 
    "org.apache.spark" %% "spark-mllib" % "2.2.0", 

回答