PostgreSQL：客戶的首選產品和第二最受歡迎的產品

我對SQL很陌生（當前使用PostgreSQL，但對任何SQL的知識感興趣），並試圖找出我認爲應該相對直接的東西。PostgreSQL：客戶的首選產品和第二最受歡迎的產品

我有一張表，每個客戶交易包含一行，對於每筆交易，我知道客戶購買了什麼。我有興趣找出哪些產品是每個客戶的首選選擇，然後是他們的第二個到最優選的選擇（並且最終，總的來說，當首選選項不可用時，什麼是首選的第二選擇）。

下面是一個實物模型的數據可能是什麼樣子：

+---------------------+-----------------+ 
| Customer_id   | Product bought | 
+---------------------+-----------------+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 1     |  Blu-ray  | 
+-     -+-    -+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 2     |  DVD   | 
+-     -+-    -+ 
| 2     |  DVD   |

成功的結果會是這樣的：

+---------------------+--------------------------------+ 
| Customer_id   | Preferred #1 | Preferred #2 | 
+---------------------+--------------------------------+ 
| 1     |  DVD   | Blu-ray | 
+-     -+-    -+ 
| 2     |  DVD   | $NULL$  |

（正如前面提到的，最後的結果（最有可能在Python/R中完成，而不是在SQL中完成，將會看到一般性基礎爲「如果首選＃1是DVD，則優先＃2是藍光」，「如果首選＃1是藍光，則首選＃2是三明治「...等等）

個

乾杯

來源

2017-05-16 Morridini

這是一個greatest-n-per-group的組合和一個pivot問題（有時也被稱爲crosstab）

你需要做的第一個步驟是確定兩個優選產品。

在你的情況下，你需要結合一個group by查詢與窗口函數。

以下查詢計數每一個客戶都多久各買產品：

select customer_id, 
     product_bought, 
     count(*) as num_products 
from sales 
group by customer_id, product_bought 
order by customer_id;

這可以增強，包括對產品被買的次數的排名：

select customer_id, 
     product_bought, 
     count(*) as num_products, 
     dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
from sales 
group by customer_id, product_bought 
order by customer_id;

這將返回以下結果（根據您的樣品數據）：

customer_id | product_bought | num_products | rnk 
------------+----------------+--------------+---- 
      1 | DVD   |   3 | 1 
      1 | Blu-ray  |   1 | 2 
      2 | DVD   |   2 | 1

我們不能申請Y A，其中在rnk列條件直接，所以我們需要一個派生表爲：

select customer_id, product_bought 
from (
    select customer_id, 
     product_bought, 
     count(*) as num_products, 
     dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
    from sales 
    group by customer_id, product_bought 
) t 
where rnk <= 2 
order by customer_id;

現在，我們需要兩行轉換爲每個客戶成列。這可以例如使用公用表表達式來完成：

with preferred_products as (
    select * 
    from (
    select customer_id, 
      product_bought, 
      count(*) as num_products, 
      dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
    from sales 
    group by customer_id, product_bought 
) t 
    where rnk <= 2 
) 
select p1.customer_id, 
     p1.product_bought as "Product #1", 
     p2.product_bought as "Product #2" 
from preferred_products p1 
    left join preferred_products p2 on p1.customer_id = p2.customer_id and p2.rnk = 2 
where p1.rnk = 1

這則返回

customer_id | Product #1 | Product #2 
------------+------------+----------- 
      1 | DVD  | Blu-ray 
      2 | DVD  |

以上是標準的SQL和任何現代DBMS會工作。

在線例如：http://rextester.com/VAID15638

來源

2017-05-16 05:57:30

真棒，感謝這幫助了很多，做一切我想做的事（學到了很多東西太）。乾杯。 – Morridini

PostgreSQL：客戶的首選產品和第二最受歡迎的產品

回答

相關問題