我的Scalding作業中有一個records:TypedType[(String, util.List[String])]
,其中第一個值是一個id,第二個值是一個東西列表。想象一下以下內容:我想只輸出互不相同一個給定的ID記錄在Scalding中生成List [String]的差異
("1", ["a","b","c"])
("1", ["a","b","c"])
("1", ["a","b","c"])
("2", ["a","b"])
("2", ["a","b","c"])
("3", ["a","b","c"])
records.groupBy(_._1)
後。對於輸入以上輸出應該是:
("2", ["a","b"])
("2", ["a","b","c"])
我是新來的Scalding。什麼是實現這一目標的優雅方式?
是的,它必須在羣集上運行。燙傷是根本 – Gevorg