1
我有一個文本文件說.... 這是阿帕奇豬,像一個魅力工程。 所以我想每次重複每個字符都要計數。 這應該打印... T =計數值t H =遞增H A =計數 B的的= .........地圖縮小框架
Can anyone tell me how do I break my words into characters into Pig.
Any help would be greatly appreciated.
我有一個文本文件說.... 這是阿帕奇豬,像一個魅力工程。 所以我想每次重複每個字符都要計數。 這應該打印... T =計數值t H =遞增H A =計數 B的的= .........地圖縮小框架
Can anyone tell me how do I break my words into characters into Pig.
Any help would be greatly appreciated.
input.txt
This is Apache pig,
works like
a charm
PigScript:
A = LOAD 'input.txt' AS line;
B = FOREACH A GENERATE (REPLACE(line,'','\n')) AS (word:chararray);
C = FOREACH B GENERATE FLATTEN(TOKENIZE(word,'\n'));
D = GROUP C BY $0;
E = FOREACH D GENERATE group,COUNT($1);
DUMP E;
Output:
(,6)
(,,1)
(A,1)
(T,1)
(a,3)
(c,2)
(e,2)
(g,1)
(h,3)
(i,4)
(k,2)
(l,1)
(m,1)
(o,1)
(p,2)
(r,2)
(s,3)
(w,1)