2013-04-28 47 views
1

我有一個大的數據文件,看起來像這樣插入新表:排序,然後使用蜂巢

1 6 
    1 6 
    2 7 
    3 2 
    3 6 
    1 7 
    1 9 
    2 9 
    1 5 
    3 9 
    3 1 
    2 8 

我想小組第一列中的數據,找到第2列平均每個第一列值,然後按第二列平均值對這些分組進行排序。所以輸出應該是:

2 8 
    1 6.6 
    3 4.5 

我的代碼看起來像現在這種權利,並不起作用:

CREATE EXTERNAL TABLE as (a STRING, b INT) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
    LOCATION 's3n://myfolder/hive'; 

    CREATE EXTERNAL TABLE output(a STRING, avgb DOUBLE) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
    LOCATION 's3n://myfolder/hive'; 

    load data inpath "s3n://myfolder/file.txt" into TABLE as; 
    insert overwrite output select a, avg(b) from as group by a order by avg(b) DESC limit 1000; 

我應該注意的是,以下的工作,但自己是不是與工作爲了通過並插入在SQL對我的工作步驟:

select a, avg(b) from as group by a; 

當我嘗試:

select a, avg(b) from as group by a order by avg(b); 

我得到「FAILED:語義分析錯誤:行1:66無效的表別名或列引用'b':(可能的列名是:_col0,_col1)。

回答

3

剛剛轉移出來的子查詢:

select a 
from (select a, avg(b) as avgb from as group by a) as t 
order by avgb;