2016-07-24 57 views
0

在字數統計程序中如何找到豬中發生最多的字和最少發生的字。這裏如何使用MAX函數。Apache Pig字數計劃

輸出我看到這樣的

(納文,3) (是,5)

這裏出來把我需要的是 「是」

回答

0

您可以使用排序依據,並限制: -

A =使用PigStorage()作爲(名稱:chararray,count:int)加載'file';

B =按數量排序的A; - 默認情況下,它將上升或下降

C =限制B 1;

D = Foreach C生成名稱;

dump D;

B = order by desc;

C =極限B 1;

D = Foreach C生成名稱;

dump D;

0

下面的例子將幫助你獲得前五名計數

infiles = load '/hdfs/bhavesh/Youtube_POC/Youtube/0222/{0,1,2,3,4}.txt' using PigStorage('\t') as 
(videoid:chararray,uploader:chararray,age:int,category:chararray,length:int,views:int,rate:int,rating:int,comments:int,related_id:chararray); 
files = FILTER infiles BY category is not null; 
grpn_for_catagories = group files by category; 
cnt_for_catagories = foreach grpn_for_catagories generate group, COUNT(files.videoid) as counting; 
sorted_for_catagories_desc = order cnt_for_catagories by counting desc; 
top5_for_catagories = limit sorted_for_catagories_desc 5; 

詳細說明,請在

http://ybhavesh.blogspot.in/2015/08/proof-of-concept-or-poc-on-youtube-data.html

希望它可以幫助!!! ...

+0

謝謝Bhavesh。 – Naveen

+0

歡迎納文!!! ... – Bhavesh

0

一=使用PigStorage()作爲(名稱:chararray,count:int)加載'文件';

B =按數量排序的A;

C =極限B 1;

D = foreach C生成名稱;

dump D;