1
所以問題的基本前提是我在hadoop中有一些巨大的表格,我需要從每個月獲取一些樣本。我嘲笑了下方顯示排序後,我的事情,但顯然它不是真實的數據...Impala分析函數在where子句中
--Create the table
CREATE TABLE exp_dqss_team.testranking (
Name STRING,
Age INT,
Favourite_Cheese STRING
) STORED AS PARQUET;
--Put some data in
INSERT INTO TABLE exp_dqss_team.testranking
VALUES (
('Tim', 33, 'Cheddar'),
('Martin', 49, 'Gorgonzola'),
('Will', 39, 'Brie'),
('Bob', 63, 'Cheddar'),
('Bill', 35, 'Brie'),
('Ben', 42, 'Gorgonzola'),
('Duncan', 55, 'Brie'),
('Dudley', 28, 'Cheddar'),
('Edmund', 27, 'Brie'),
('Baldrick', 29, 'Gorgonzola'));
我想要得到的是像最年輕的2人在每個類別的奶酪。下面讓我對每個類別的奶酪歲的排名,但不會將其限制前兩名:
SELECT RANK() OVER(PARTITION BY favourite_cheese ORDER BY age asc) AS rank_my_cheese, favourite_cheese, name, age
FROM exp_dqss_team.testranking;
如果我添加一個WHERE
條款它給了我下面的錯誤:
WHERE clause must not contain analytic expressions
SELECT RANK() OVER(PARTITION BY favourite_cheese ORDER BY age asc) AS rank_my_cheese, favourite_cheese, name, age
FROM exp_dqss_team.testranking
WHERE RANK() OVER(PARTITION BY favourite_cheese ORDER BY age asc) <3;
有沒有更好的方法來做到這一點比創建一個所有排名表,然後從排名WHERE
條款選擇?
謝謝,是的,它的工作原理。我想我可能是在過度思考它! –