2016-08-23 33 views
1

我想問一下,我該如何計算重複值?
其格式爲:用戶,項目,事件
我想算,每項目顯示多少次。
下面是一些例子:
如何統計scala中的重複值?

US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7) 


從這個鏈接:
​​

我的代碼:

val RATING_SPLITER = N1.map(
     { 
     baris => (
      baris(0), 
      baris(1), 
      baris(2) match { 
      case "read" => 10 
      case "play" => 6 
      case "share" => 7 
      } 
     ) 
     } 
    ).take(1000) 
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2)) 
    MM.foreach(println) 

,然後,下面的輸出:

[Lscala.Tuple3;@fd53053 
[Lscala.Tuple3;@4527f70a 
[Lscala.Tuple3;@707b1a44 
[Lscala.Tuple3;@7132a9dc 
[Lscala.Tuple3;@57435801 
[Lscala.Tuple3;@2da66a44 
[Lscala.Tuple3;@527fc8e 
[Lscala.Tuple3;@61bfc9bf 
[Lscala.Tuple3;@2c7106d9 
[Lscala.Tuple3;@329bad59 


任何想法,爲什麼輸出的樣子嗎?並且它是我的代碼正確計數重複值?

+1

嘗試通過場印刷代替的'MM.foreach(TUP =>的println(tup._1 + tup._2 ...))'代替投擲整個對象到輸出的'Tuple'場。 – sebszyller

回答

2

您應該從groupBy得到的值映射到其大小 - groupBy創建鍵值對其中的值是相同的密鑰的所有項目的集合,你只是在該尺寸興趣集合:

// sample data: 
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7)) 

val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size) 
result.foreach(println) 
// prints: 
// (e,1) 
// (b,2) 
// (c,1) 
+0

好的,非常感謝。是工作 –