0
,我有以下的數據集找到最frequant值:與豬
dump DATA_INPUT;
(0000001686601081020,10A)
(0000001686601081020,08D)
(0000001686601081020,08D)
(0000001686601081020,08D)
(0000001686601081020,09D)
(0000001686601081020,09D)
(0000001686601081020,08D)
(0000001686601081020,08D)
(0000001686601081020,08D)
(0000001686676950125,0A1)
(0000001686676950125,0A1)
(0000001686676950125,0A2)
列$ 0 ACCOUNT_ID,列$ 1單元ID。
對於每個account_id我需要找到最frequant單元ID。
第一步,我試圖做的是:
grpd = group DATA_INPUT by ($0, $1);
cells_count = foreach grpd GENERATE group, COUNT(DATA_INPUT.$1) AS count;
all_cells_counts = GROUP cells_count BY group.$0;
top_cell = FOREACH all_cells_counts {
A = ORDER cells_count BY count DESC;
B = LIMIT A 1;
GENERATE FLATTEN(B.group);
}
我得到的rezult:
((0000001686601081020,08D))
((0000001686676950125,0A1))
我怎樣才能擺脫括號(的),有在rezult
(0000001686601081020,08D)
(0000001686676950125,0A1)
非常感謝!那是我在很多方面試圖做的\t 不成功:) – Marta 2014-10-31 17:33:16