2016-12-04 45 views
0

我試圖對Pig中的文件執行中間操作。該文件看起來像這樣。我的豬拉丁腳本中的錯誤

NewYork,-1 
NewYork,-5 
NewYork,-2 
NewYork,3 
NewYork,4 
NewYork,13 
NewYork,11 
Amsterdam,12 
Amsterdam,11 
Amsterdam,2 
Amsterdam,1 
Amsterdam,-1 
Amsterdam,-4 
Mumbai,1 
Mumbai,4 
Mumbai,5 
Mumbai,-2 
Mumbai,9 
Mumbai,-4 

該文件被加載,它裏面的數據如下組合:

wdata = load 'weatherdata' using PigStorage(',') as (city:chararray, temp:int); 
wdata_g = group wdata by city; 

進出口試圖擺脫城市的所有溫度值如下:

wdata_tempmedian = foreach wdata_g { tu = wdata.temp as temp; ord = order tu by temp generate group, Median(ord); } 

數據正在排序,因爲需要按排序順序查找中位數。 但即時得到我無法弄清楚什麼是錯誤以下錯誤信息:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 3, column 53> mismatched input 'as' expecting SEMI_COLON 

任何幫助深表感謝。

回答

0

您錯過了';'訂購溫度後。

wdata_tempmedian = FOREACH wdata_g { 
        tu = wdata.temp as temp; 
        ord = ORDER tu BY temp; 
        GENERATE group, Median(ord); 
         } 

OR

wdata_ordered = ORDER wdata_g BY temp; 
wdata_tempmedian = FOREACH wdata_ordered GENERATE group, Median(ord); 

注:我假設你正在使用的數據夫因爲PIG沒有中位數function.Ensure罐子正確註冊

register /path/datafu-pig-incubating-1.3.1.jar 
+0

是的,工作。 – Sidhartha