如何將變換爲DTO列表爲Spark ML輸入數據集格式?使用Java的Spark MLlib分類輸入格式
我DTO:
public class MachineLearningDTO implements Serializable {
private double label;
private double[] features;
public MachineLearningDTO() {
}
public MachineLearningDTO(double label, double[] features) {
this.label = label;
this.features = features;
}
public double getLabel() {
return label;
}
public void setLabel(double label) {
this.label = label;
}
public double[] getFeatures() {
return features;
}
public void setFeatures(double[] features) {
this.features = features;
}
}
和代碼:
Dataset<MachineLearningDTO> mlInputDataSet = spark.createDataset(mlInputData, Encoders.bean(MachineLearningDTO.class));
LogisticRegression logisticRegression = new LogisticRegression();
LogisticRegressionModel model = logisticRegression.fit(MLUtils.convertMatrixColumnsToML(mlInputDataSet));
的代碼執行後我得到:
java.lang.IllegalArgumentException異常:要求失敗:列 功能必須是[email protected] 類型,但行爲ually ArrayType(DoubleType,false)。
如果使用代碼更改爲org.apache.spark.ml.linalg.VectorUDT:
VectorUDT vectorUDT = new VectorUDT();
vectorUDT.serialize(Vectors.dense(......));
然後我得到:
java.lang.UnsupportedOperationException:無法推斷類型爲 org.apache.spark.ml.linalg.VectorUDT,因爲它不符合bean的要求
在 org.apache.spark .sql.catalyst.JavaTypeInference $ .ORG $阿帕奇$火花$ SQL $ $催化劑$$ JavaTypeInference serializerFor(JavaTypeInference.scala:437)