2016-09-29 75 views
-1

我對豬環境很陌生。我試圖用兩種方法來實現我的豬腳本文件。在Apache Pig中實施UPPER,TRIM和REPLACE

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray); 

distinct_data = DISTINCT data; 

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword),display_site,placement,was_clicked,cpc; 

val1 = foreach val generate campaign_id,date,time,TRIM(keyword),display_site,placement,was_clicked,cpc; 

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/'),time,keyword,display_site,placement,was_clicked,cpc; 

dump val2; 

我得到錯誤:

2016-09-29 02:45:40,826 INFO org.apache.pig.Main: Apache Pig version 0.10.0-cdh4.2.1 (rexported) compiled Apr 22 2013, 12:04:54 2016-09-29 02:45:40,827 INFO org.apache.pig.Main: Logging error messages to: /home/training/training_materials/analyst/exercises/pig_etl/pig_1475131540824.log 2016-09-29 02:45:42,371 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 1025: Invalid field projection. Projected field [keyword] does not exist in schema: campaign_id:chararray,date:chararray,time:chararray,org.apache.pig.builtin.upper_keyword_12:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int. Details at logfile: /home/hduser/pig_etl/pig_1475131540824.log

但是當我整合上,裝飾和在一個語句來替換,然後它工作:

II。

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray); 

distinct_data = DISTINCT data; 

val = foreach distinct_data generate campaign_id,REPLACE(date, '-', '/'),time,TRIM(UPPER(keyword)),display_site,placement,was_clicked,cpc; 
dump val; 

所以,我只是想讓別人解釋我爲什麼我方法不起作用,錯誤信息是什麼。

回答

0

當您在中應用val1時,在val中沒有任何稱爲「keyword」的東西。

注意,當你申請的任何功能使用別名,這樣的錯誤u能避免..

或創建一個新的關係,這是一件好事之前使用describe,這樣的模式是明確到u ..

解決方案將是:

data = LOAD 'sample2.txt' USING PigStorage(',') as(campaign_id:chararray,date:chararray,time:chararray,display_site:chararray,placement:chararray,was_clicked:int,cpc:int,keyword:chararray); 

distinct_data = DISTINCT data; 

val = foreach distinct_data generate campaign_id,date,time,UPPER(keyword) as keyword,display_site,placement,was_clicked,cpc; 

val1 = foreach val generate campaign_id,date,time,TRIM(keyword) as keyword,display_site,placement,was_clicked,cpc; 

val2 = foreach val1 generate campaign_id,REPLACE(date, '-', '/') as date,time,keyword,display_site,placement,was_clicked,cpc; 

dump val2; 
+0

感謝@ankur和很好的建議,我肯定會用從現在開始描述。 – curious