2017-10-19 150 views
0

我在外部jar中設置了pojos,我想從這些對象中創建Dataset。 如果我從Scala案例類創建數據集,那麼我可以根據期望創建數據集。 如果我試圖做與JAVA對象相同,它將一列中的所有數據作爲一個對象。從斯卡拉JAVA對象創建火花數據集,spark 1.6

case class patientDiagnosis(patientId: Long, visitId: Long, diagnosisCode: String, isPrimaryDiagnosis: String, patientDiagnosisId: Long, sourceSystemUniqueIdentifier: String, diagnosisCodeSystem: String) {} 

println("case Dataset from scala object :") 
joinDf.as[patientDiagnosis].show() 

OUTPUT: 
case Dataset from scala object : 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 
|patientId|visitId|diagnosisCode|isPrimaryDiagnosis|patientDiagnosisId|sourceSystemUniqueIdentifier|diagnosisCodeSystem| 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 
| 1388158|1764555|  296.20|     1|   1247383|      1247383|    ICD9| 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 

當我試圖做到這一點在Java中,給出以下的輸出:

JAVA Object: 

public class PatientDiagnosis implements Serializable{ 

private static final long serialVersionUID = -7971192387675901350L; 

private long patientId; 
private long visitId; 
private String diagnosisCode; 
private String isPrimaryDiagnosis; 
private long patientDiagnosisId; 
private String sourceSystemUniqueIdentifier; 
private int isDeleted; 
private String diagnosisCodeSystem; 
} 

scala code: 

import sqlContext.implicits._ 
val p:Encoder[com....PatientDiagnosis] = Encoders.bean(classOf[com....PatientDiagnosis]) 
println("case Java Encoder :") 
joinDiagnf.as[com....PatientDiagnosis](p).show(false) 

OUTPUT: 
case Java Encoder : 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 
|diagnosisCode                                                |diagnosisCodeSystem|isDeleted|isPrimaryDiagnosis|patientDiagnosisId|patientId|sourceSystemUniqueIdentifier|visitId| 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 
|PatientDiagnosis [patientId=0, visitId=1764555, diagnosisCode=296.20, isPrimaryDiagnosis=1, patientDiagnosisId=1247383, sourceSystemUniqueIdentifier=1247383, isDeleted=0, diagnosisCodeSystem=ICD9]| 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 

我做任何語法錯誤或不被支持斯卡拉火花來創建JAVA對象數據集1.6版本。

+0

'joinDiagnf'的模式是什麼? –

+0

與每個對象相同 – Kalpesh

回答

0

對不起我的錯誤,它給出正確的輸出。 我以前沒有得到這個,因爲dataset.show視圖沒有給出正確的解釋。 當我選擇特定列時,這些列具有所需的值。