下面是我如何做到這一點:
val pipeline = new Pipeline()
.setStages(Array(tokenizer, stopWordsFilter, tf, idf, word2Vec, featureVectorAssembler, categoryIndexerModel, classifier, categoryReverseIndexer))
...
val paramGrid = new ParamGridBuilder()
.addGrid(tf.numFeatures, Array(10, 100))
.addGrid(idf.minDocFreq, Array(1, 10))
.addGrid(word2Vec.vectorSize, Array(200, 300))
.addGrid(classifier.maxDepth, Array(3, 5))
.build()
paramGrid.size // 16 entries
...
// Print the average metrics per ParamGrid entry
val avgMetricsParamGrid = crossValidatorModel.avgMetrics
// Combine with paramGrid to see how they affect the overall metrics
val combined = paramGrid.zip(avgMetricsParamGrid)
...
val bestModel = crossValidatorModel.bestModel.asInstanceOf[PipelineModel]
// Explain params for each stage
val bestHashingTFNumFeatures = bestModel.stages(2).asInstanceOf[HashingTF].explainParams
val bestIDFMinDocFrequency = bestModel.stages(3).asInstanceOf[IDFModel].explainParams
val bestWord2VecVectorSize = bestModel.stages(4).asInstanceOf[Word2VecModel].explainParams
val bestDecisionTreeDepth = bestModel.stages(7).asInstanceOf[DecisionTreeClassificationModel].explainParams
拉鍊的作品,但我真的不喜歡它,因爲它假設了關於CrossValidator如何工作的內部知識。他們可能會改變度量數組的構建方式,因此它會按照與下一個版本不同的順序進行構建,並且您會被弄糊塗,但由於代碼仍然有效,所以不知道您的使用情況。 我想要用它的度量返回一個模型的參數。我也希望看到摘要統計數據,而不僅僅是平均數。沒有標準偏差的平均值有多大用處? – Turbo