0
自定義對象我想叫API,它預計Employee對象,如下圖所示:轉換火花數據集在斯卡拉
public class EmployeeElements {
private Set<Long> eIds;
private Map<Long, List<EmployeeDetails>> employeeDetails;
private Map<Long, List<Address>> address;
}
我想創建格式對象:Map[Long,List[CustomObject]]
了數據集
由於輸入我們將有以下數據框。
EmployeeDF
+---------+-------------------+
|emp_Id |hired_Date |
+---------+-------------------+
|1387779 |2016-09-27 19:47:28|
|1387781 |2016-09-27 19:47:28|
|1387780 |2016-09-27 19:47:28|
|1387780 |2016-09-27 19:47:28|
|1387777 |2016-09-27 19:47:28|
|1387778 |2016-09-27 19:47:28|
+---------+-------------------+
EmployeeDetailsDF:
+---------+---------+--------+----------+-------------------+
|emp_Id |firstname|lastname|gendercode|dateofbirth |
+---------+---------+--------+----------+-------------------+
|1387777 |Jon |Snow |F |1985-01-01 00:00:00|
|1387778 |Jon |Snow |M |1985-01-01 00:00:00|
|1387779 |Jon |Snow |F |1985-01-01 00:00:00|
|1387780 |Jon |Snow |F |1985-01-01 00:00:00|
|1387781 |Jon |Snow |F |1985-01-01 00:00:00|
+---------+---------+--------+----------+-------------------+
AddressDf:
+---------+------------------+-----------------+-------------------+
|emp_Id |patient_address_id|Country |joined_Date |
+---------+------------------+-----------------+-------------------+
|1387779 |2435146 |USA |2016-09-27 19:47:28|
|1387781 |2435148 |AUS |2016-09-27 19:47:28|
|1387780 |2435147 |USA |2016-09-27 19:47:28|
|1387780 |2435149 |UK |2016-09-27 19:47:28|
|1387777 |2435144 |AUS |2016-09-27 19:47:28|
|1387778 |2435145 |USA |2016-09-27 19:47:28|
+---------+------------------+-----------------+-------------------+
EmployeeDetailsDF
將獲得參加與EmployeeDF
。 同樣AddressDf
也將加入EmployeeDF
。
Now out of that join Dataframe I wanted to create `Map[Long,List[CustomObject]]`, for example: couple of rows of joindDataFrame of AddressDf and EmployeeDF
(1387777,List([1387777,2435144,AUS]))
(1387780,List([1387780,2435147,USA], [1387780,2435149,UK]))
Exception:
Exception in thread "main" java.lang.ClassCastException: org.apache.spark.sql.GroupedDataset cannot be cast to scala.collection.immutable.Map
when I am trying to cast groupAddressDs to Map:
//empAddressDs is the joined Dataset of AddressDf and EmployeeDF
val groupAddressDs:GroupedDataset[Long, Address] = empAddressDs.groupBy { x => x.emp_Id }
val addressMap = groupAddressDs.asInstanceOf[Map[String, List[Procedure]]]
但我沒有得到同樣的任何解決方案,我試圖與GroupedDataset
但最終我們無法施展它映射對象它給人類轉換異常, 然後用pairRdd
嘗試過,但後來我我無法將rdd的行再次轉換爲我的自定義對象。
我必須處理EmployeeElements
對象中的數百萬個數據和數百個自定義對象。
您是否試圖將百萬數據編組爲一個EmployeeElements對象?你最終需要用EmployeeElements做什麼? – josephpconley
@josephpconley:我想將EmployeeElements對象傳遞給外部API。 – Kalpesh