您可以同時威脅索引器和編碼器 - 因爲PipelineStage
將它們添加到您的管線中並一步完成整個管線。示例:
String INDEX_APPENDIX = "_IDX";
String VECTOR_APPENDIX = "_VEC";
ArrayList<PipelineStage> stages = new ArrayList<>();
for (String column : Arrays.asList("col1", "col2")) {
stages.add(new StringIndexer().setInputCol(column).setOutputCol(column + INDEX_APPENDIX));
stages.add(new OneHotEncoder().setInputCol(column + INDEX_APPENDIX).setOutputCol(column +
VECTOR_APPENDIX));
}
Pipeline pipeline = new Pipeline()
.setStages(stages.toArray(new PipelineStage[stages.size()]));
Dataset<Row> processedDf = pipeline.fit(df).transform(df);
來源
2017-02-21 11:24:57
moe