0
我目前擁有((id, code), (list of events with keys id and code))
形式的組RDD。看下面,ID是000406106-01
,代碼是496
,並且個別事件每個Diagnostic
案例類別。我希望做的是獲得((id, code), count of events)
表格的RDD。基本上,我想將Diagnostic
事件的CompactBuffer
對象合併爲事件的計數。有什麼建議麼?將案例類別列表減少爲個案類別的計數
ID CODE EVENT1 EVENT2
((000406106-01,496),CompactBuffer(Diagnostic(000406106-01,Sun Apr 16 02:24:00 UTC 2006,496), Diagnostic(000406106-01,Fri Jul 20 15:30:00 UTC 2012,496), Diagnostic(000406106-01,Tue Dec 23 17:00:00 UTC 2014,496), Diagnostic(000406106-01,Wed Jan 06 20:45:00 UTC 2010,496), Diagnostic(000406106-01,Fri Mar 04 16:30:00 UTC 2011,496), Diagnostic(000406106-01,Sun Aug 04 04:51:00 UTC 2013,496), Diagnostic(000406106-01,Fri Mar 11 16:00:00 UTC 2011,496), Diagnostic(000406106-01,Tue Jul 10 13:45:00 UTC 2012,496), Diagnostic(000406106-01,Wed Jun 15 20:00:00 UTC 2005,496), Diagnostic(000406106-01,Tue Dec 29 13:30:00 UTC 2009,496), Diagnostic(000406106-01,Fri Jul 13 13:30:00 UTC 2012,496), Diagnostic(000406106-01,Thu Jul 26 03:40:00 UTC 2007,496), Diagnostic(000406106-01,Mon Jun 13 14:45:00 UTC 2005,496), Diagnostic(000406106-01,Wed Dec 24 18:00:00 UTC 2014,496), Diagnostic(000406106-01,Thu Mar 03 15:45:00 UTC 2011,496), Diagnostic(000406106-01,Wed Dec 31 15:00:00 UTC 2014,496), Diagnostic(000406106-01,Sat Jul 26 04:39:00 UTC 2008,496), Diagnostic(000406106-01,Thu Dec 31 20:30:00 UTC 2009,496)))
我正在尋找:
ID CODE COUNT
((000406106-01,496), 20)
編輯:爲了清楚起見,這裏是如何的RDD上述正在生成:
val grpDiag = diagnostic.groupBy(diag => (diag.id, diag.code))
,其中,診斷是未分組RDD以上數據。