2017-02-05 93 views
2

我擁有的數據與列名PIG作爲無效標投影

關鍵字,CAMPAIGN_ID,日期,時間,display_site,was_clicked,黨,國家,放置

我試圖要做的是找到點擊率高的關鍵字。

所以,我試圖理解爲什麼下面的代碼是給我無效標投影誤差

grouped = GROUP data BY keyword; 
    by_keyword = FOREACH grouped 
{ 
    clicked = FILTER data BY was_clicked == 1; 
    total = COUNT(data.keyword); 
    GENERATE group, ((double)COUNT(clicked)/total) AS ctr; 
} 

我得到的錯誤:

37,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<line 59, column 33> Invalid scalar projection: clicked : A column needs to be projected from a relation for it to be used as a scalar 
Details at logfile: /home/cloudera/pig_1486224821223.log 

任何幫助,將不勝感激。

編輯:

data = LOAD '/user/cloudera/pig_demo/ad_data.txt' AS (keyword:chararray,campaign_id:chararray, 
     date:chararray, time:chararray,display_site:chararray, was_clicked:int, 
     cpc:int, country:chararray, placement:chararray); 

的記錄樣本:

tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP 
+0

你能提供負載聲明以及與樣品記錄一起。 – franklinsijo

+0

@franklinsijo提供樣品記錄和裝入聲明。 –

+0

無法重現錯誤。添加了幾條與','分隔文本相同模式的記錄。適合我! – franklinsijo

回答

1

豬版本0.15。

輸入文件data.txt

tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP 
tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP 
tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP 
tablet C6 5/1/2013 3:47:10 movienet.example.com 1 102 USA TOP 

腳本:

data = LOAD '/path/data.txt' AS (keyword:chararray,campaign_id:chararray, 
    date:chararray, time:chararray,display_site:chararray, was_clicked:int, 
    cpc:int, country:chararray, placement:chararray); 
grouped = GROUP data BY keyword; 
by_keyword = FOREACH grouped 
{ 
    clicked = FILTER data BY was_clicked == 1; 
    total = COUNT(data.keyword); 
    GENERATE group, ((double)COUNT(clicked)/total) AS ctr; 
} 
dump by_keyword 

給了我正確的結果:

(tablet,0.25) 
+0

好吧,重新啓動終端中的PIG並再次加載數據文件,然後僅執行上述命令。有效! –