2016-04-26 60 views
2

我有以下表格:MODE聚集功能

客戶

customer_id name 
---------------- 
1   bob 
2   alice 
3   tim 

購買

id customer_id item_bought 
-------------------------- 
1 1   hat 
2 1   shoes 
3 2   glasses 
3 2   glasses 
4 2   book 
5 3   shoes 
6 1   hat 

我想下面的結果:

customer_name item_bought_most_often 
------------------------------------ 
bob   hat 
alice   glasses 
tim   shoes 

我會做這樣的(實際上沒有試過,只是想法):

SELECT customer.name as customer_name, 
    MODE(item_bought), as item_bought_most_ofen 
FROM customers 
INNER JOIN purchases USING (customer_id) 
GROUP_BY customer_id 

然而,MODE aggregation function不存在紅移。

看來Redshift user defined functions只是常規的標量函數,而不是聚合函數。所以我不認爲我可以自己定義它。

任何解決方法?

回答

2

您可以通過使用row_number()模仿mode()

select name, item_bought 
from (select c.name, p.item_bought, count(*) as cnt, 
      row_number() over (order by count(*) desc) as seqnum 
     from customers c join 
      purchases p 
      using (customer_id) 
     group by c.name, p.item_bought 
    ) cp 
where seqnum = 1; 
+0

Amazon Redshift是否允許在同一級別引用** cnt **:select count(*)as cnt, row_number()over(by cnt desc)as seqnum'? – lad2025

+0

@ lad2025。 。 。 Arrrgh。最近Google BigQuery太多了。 –

1

你可以先COUNT每個人購買,然後用RANK()窗口函數:

SELECT name AS customer_name, item_bought AS item_bought_most_often 
FROM(SELECT name,item_bought,RANK() OVER(PARTITION BY name ORDER BY cnt DESC) rnk 
    FROM (SELECT c.name, p.item_bought, COUNT(*) AS cnt 
      FROM customers c 
      JOIN purchases p 
      ON p.customer_id = c.customer_id 
      GROUP BY c.name, p.item_bought) AS s1) AS s2 
WHERE rnk = 1; 

LiveDemo

輸出:

╔═══════════════╦════════════════════════╗ 
║ customer_name ║ item_bought_most_often ║ 
╠═══════════════╬════════════════════════╣ 
║ alice   ║ glasses    ║ 
║ bob   ║ hat     ║ 
║ tim   ║ shoes     ║ 
║ zoe   ║ pencil     ║ 
║ zoe   ║ book     ║ 
╚═══════════════╩════════════════════════╝ 

注:

RANK將處理多個最常用的值。

+0

我在做類似的事情。我真的希望有一個像FIRST(my_column),MODE(my_column)這樣的聚合函數,或者定義它的能力。但它不存在。另一種可能性是'SPLIT_PART(LISTAGG(id,','),',',1)'。或'udf_mode(LISTAGG,id,',')'。 udf_mode是一個用戶定義的函數,用於根據由逗號分隔的值串計算模式。但這些都是哈克。 –

+1

@pinouchon基於[doc](http://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html)*'您可以創建自定義的用戶定義的標量<<函數(UDF)'*。我沒有看到像Postgresql [CREATE AGGREGATE](http://www.postgresql.org/docs/current/static/sql-createaggregate.html)中的用戶定義的聚集函數doc,使用'LISTAGG'並使用udf_mode可以工作。 – lad2025