1
我想將屬於特定類別的一組值合併到HLL數據結構中,以便稍後可以執行交叉點和聯合並計算結果這種計算的基數。如何使用Algebird的HyperLogLogMonoid執行任意交叉點和聯合
我能得到的地步,我可以用com.twitter.algebird.HyperLogLogAggregator
我需要使用com.twitter.algebird.HyperLogLogMonoid存儲爲HLL,然後幫助估計每個組的基數稍後用於計算交叉點/聯合。
val lines_parsed = lines.map { line => parseBlueKaiLogEntry(line) } # (uuid, [category id array]) val lines_parsed_flat = lines_parsed.flatMap { case(uuid, category_list) => category_list.toList.map { category_id => (category_id, uuid) } } # (category_id, uuid) # Group by category val lines_parsed_grped = lines_parsed_flat.groupBy { case (cat_id, uuid) => cat_id } # Define HLL aggregator val hll_uniq = HyperLogLogAggregator.sizeAggregator(bits=12).composePrepare[(String, String)]{case(cat_id, uuid) => uuid.toString.getBytes("UTF-8")} # Aggregate using hll count lines_parsed_grped.aggregate(hll_uniq).dump # (category_id, count) - expected output
現在,我嘗試使用HLL含半幺羣
# I now want to store as HLL and this is where I'm not sure what to do # Create HLL Monoid val hll = new HyperLogLogMonoid(bits = 12) val lines_grped_hll = lines_parsed_grped.mapValues { case(cat_id:String, uuid:String) => uuid}.values.map {v:String => hll.create(v.getBytes("UTF-8"))} # Calling dump results in a lot more lines that I expect to see lines_grped_hll.dump
我在做什麼擰在這裏?
什麼樣的結果你期待?按ID分組的貓總數? – FaigB