2014-10-28 39 views
1

我有一個豬腳本,通過json的「公司」部分加載文件。當我執行計數時,如果文件中的域缺失(或空),則count爲0。我怎樣才能將它分組爲空字符串並仍然計算在內?文件豬腳本:計數在空字段返回0

例子:

{"company": {"domain": "test1.com", "name": "test1 company"}} 
{"company": {"domain": "test1.com", "name": "test1 company"}} 
{"company": {"domain": "test1.com", "name": "test2 company"}} 
{"company": {"domain": "test2.com", "name": "test2 company"}} 
{"company": {"domain": "test2.com", "name": "test3 company"}} 
{"company": {"domain": "test3.com", "name": "test3 company"}} 
{"company": {"domain": "test3.com", "name": "test3 company"}} 
{"company": {"name": "test4 company"}} 
{"company": {"name": "test4 company"}} 

預計業績:

"test1.com", "test1 company", 2 
"test1.com", "test2 company", 1 
"test2.com", "test2 company", 1 
"test2.com", "test3 company", 1 
"test3.com", "test3 company", 2 
"", "test4 company", 2 

實際結果:

"test1.com", "test1 company", 2 
"test1.com", "test2 company", 1 
"test2.com", "test2 company", 1 
"test2.com", "test3 company", 1 
"test3.com", "test3 company", 2 
, "test4 company", 0 

當前豬腳本:

data = LOAD'myfile' USINGorg.apache.pig.piggybank.storage.JsonLoader('company: (domain:chararray, name:chararray)'); 
filtered = FILTER data BY (company is not null); 
events = FOREACH filtered GENERATE FLATTEN(company) as (domain, name); 
grouped = GROUP events BY (domain, name); 
counts = FOREACH grouped GENERATE group as domain, COUNT(events) as count; 
ordered = ORDER counts by count DESC; 

感謝您的幫助!

回答

0

不是COUNT的嘗試COUNT_STAR,

數= FOREACH分組GENERATE組作爲域,COUNT_STAR(事件)的計數;

+0

作品!謝謝!! – 2014-10-28 16:41:11