0
我在外部jar中設置了pojos,我想從這些對象中創建Dataset。 如果我從Scala案例類創建數據集,那麼我可以根據期望創建數據集。 如果我試圖做與JAVA對象相同,它將一列中的所有數據作爲一個對象。從斯卡拉JAVA對象創建火花數據集,spark 1.6
case class patientDiagnosis(patientId: Long, visitId: Long, diagnosisCode: String, isPrimaryDiagnosis: String, patientDiagnosisId: Long, sourceSystemUniqueIdentifier: String, diagnosisCodeSystem: String) {}
println("case Dataset from scala object :")
joinDf.as[patientDiagnosis].show()
OUTPUT:
case Dataset from scala object :
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
|patientId|visitId|diagnosisCode|isPrimaryDiagnosis|patientDiagnosisId|sourceSystemUniqueIdentifier|diagnosisCodeSystem|
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
| 1388158|1764555| 296.20| 1| 1247383| 1247383| ICD9|
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
當我試圖做到這一點在Java中,給出以下的輸出:
JAVA Object:
public class PatientDiagnosis implements Serializable{
private static final long serialVersionUID = -7971192387675901350L;
private long patientId;
private long visitId;
private String diagnosisCode;
private String isPrimaryDiagnosis;
private long patientDiagnosisId;
private String sourceSystemUniqueIdentifier;
private int isDeleted;
private String diagnosisCodeSystem;
}
scala code:
import sqlContext.implicits._
val p:Encoder[com....PatientDiagnosis] = Encoders.bean(classOf[com....PatientDiagnosis])
println("case Java Encoder :")
joinDiagnf.as[com....PatientDiagnosis](p).show(false)
OUTPUT:
case Java Encoder :
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
|diagnosisCode |diagnosisCodeSystem|isDeleted|isPrimaryDiagnosis|patientDiagnosisId|patientId|sourceSystemUniqueIdentifier|visitId|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
|PatientDiagnosis [patientId=0, visitId=1764555, diagnosisCode=296.20, isPrimaryDiagnosis=1, patientDiagnosisId=1247383, sourceSystemUniqueIdentifier=1247383, isDeleted=0, diagnosisCodeSystem=ICD9]|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
我做任何語法錯誤或不被支持斯卡拉火花來創建JAVA對象數據集1.6版本。
'joinDiagnf'的模式是什麼? –
與每個對象相同 – Kalpesh