在Redshift中按類別選擇n最大計數

我想選擇表格中每個組的X個最常見的配對。讓我們在下表中考慮：在Redshift中按類別選擇n最大計數

+-------------+-----------+ 
| identifier | city | 
+-------------+-----------+ 
| AB   | Seattle | 
| AC   | Seattle | 
| AC   | Seattle | 
| AB   | Seattle | 
| AD   | Seattle | 
| AB   | Chicago | 
| AB   | Chicago | 
| AD   | Chicago | 
| AD   | Chicago | 
| BC   | Chicago | 
+-------------+-----------+

西雅圖，AB發生2倍
西雅圖，AC發生2倍
西雅圖，AD發生1X
芝加哥，AB發生2倍
芝加哥， AD發生2次
公元前芝加哥發生1x

如果我想選擇每個城市的2個最公地，結果應該是：

+-------------+-----------+ 
| identifier | city | 
+-------------+-----------+ 
| AB   | Seattle | 
| AC   | Seattle | 
| AB   | Chicago | 
| AD   | Chicago | 
+-------------+-----------+

任何幫助表示讚賞。謝謝， Benni

來源

2017-07-28 Benni

的[獲取前n個記錄各組分組結果]可能的複製（https://stackoverflow.com/questions/12113699/get-top- n-records-for-each-group-of-grouped-results） – mato

您可以在行號中使用count來訂購每個城市組合的出場次數，並選擇前兩個。

select city,identifier 
from (
select city,identifier 
,row_number() over(partition by city order by count(*) desc,identifier) as rnum_cnt 
from tbl 
group by city,identifier 
) t 
where rnum_cnt<=2

來源

2017-07-28 00:41:35

你不能在分區內使用count（*）'，至少在Redshift中，計數應該在子查詢中完成 – AlexYes

@AlexYes看起來你能夠。答案中的查詢給了我正確的結果。此外，[documentation]（http://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_synopsis.html）表示在訂單列表中允許使用表達式。 –

@DmitriiI。有趣！當我看到文檔時，我只想着標量表達式，我不知道Redshift是那麼聰明:)謝謝！ – AlexYes

使用WITH條款：

with 
    _counts as (
     select 
      identifier, 
      city, 
      count(*) as city_id_count 
     from 
      t1 
     group by 
      identifier, 
      city 
    ), 

    _counts_and_max as (
     select 
      identifier, 
      city, 
      city_id_count, 
      max(city_id_count) over (partition by city) as city_max_count 
     from 
      _counts 
    ) 

    select 
     identifier, 
     city 
    from 
     _counts_and_max 
    where 
     city_id_count = city_max_count 
    ;

來源

2017-07-28 10:03:56

在Redshift中按類別選擇n最大計數

回答

相關問題