2017-05-16 56 views
0

我對SQL很陌生(當前使用PostgreSQL,但對任何SQL的知識感興趣),並試圖找出我認爲應該相對直接的東西。PostgreSQL:客戶的首選產品和第二最受歡迎的產品

我有一張表,每個客戶交易包含一行,對於每筆交易,我知道客戶購買了什麼。我有興趣找出哪些產品是每個客戶的首選選擇,然後是他們的第二個到最優選的選擇(並且最終,總的來說,當首選選項不可用時,什麼是首選的第二選擇)。

下面是一個實物模型的數據可能是什麼樣子:

+---------------------+-----------------+ 
| Customer_id   | Product bought | 
+---------------------+-----------------+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 1     |  Blu-ray  | 
+-     -+-    -+ 
| 1     |  DVD   | 
+-     -+-    -+ 
| 2     |  DVD   | 
+-     -+-    -+ 
| 2     |  DVD   | 

成功的結果會是這樣的:

+---------------------+--------------------------------+ 
| Customer_id   | Preferred #1 | Preferred #2 | 
+---------------------+--------------------------------+ 
| 1     |  DVD   | Blu-ray | 
+-     -+-    -+ 
| 2     |  DVD   | $NULL$  | 

(正如前面提到的,最後的結果(最有可能在Python/R中完成,而不是在SQL中完成,將會看到一般性基礎爲「如果首選#1是DVD,則優先#2是藍光」,「如果首選#1是藍光,則首選#2是三明治「...等等)

乾杯

回答

1

這是一個的組合和一個問題(有時也被稱爲

你需要做的第一個步驟是確定兩個優選產品。

在你的情況下,你需要結合一個group by查詢與窗口函數。

以下查詢計數每一個客戶都多久各買產品:

select customer_id, 
     product_bought, 
     count(*) as num_products 
from sales 
group by customer_id, product_bought 
order by customer_id; 

這可以增強,包括對產品被買的次數的排名:

select customer_id, 
     product_bought, 
     count(*) as num_products, 
     dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
from sales 
group by customer_id, product_bought 
order by customer_id; 

這將返回以下結果(根據您的樣品數據):

customer_id | product_bought | num_products | rnk 
------------+----------------+--------------+---- 
      1 | DVD   |   3 | 1 
      1 | Blu-ray  |   1 | 2 
      2 | DVD   |   2 | 1 

我們不能申請Y A,其中在rnk列條件直接,所以我們需要一個派生表爲:

select customer_id, product_bought 
from (
    select customer_id, 
     product_bought, 
     count(*) as num_products, 
     dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
    from sales 
    group by customer_id, product_bought 
) t 
where rnk <= 2 
order by customer_id; 

現在,我們需要兩行轉換爲每個客戶成列。這可以例如使用公用表表達式來完成:

with preferred_products as (
    select * 
    from (
    select customer_id, 
      product_bought, 
      count(*) as num_products, 
      dense_rank() over (partition by customer_id order by count(*) desc) as rnk 
    from sales 
    group by customer_id, product_bought 
) t 
    where rnk <= 2 
) 
select p1.customer_id, 
     p1.product_bought as "Product #1", 
     p2.product_bought as "Product #2" 
from preferred_products p1 
    left join preferred_products p2 on p1.customer_id = p2.customer_id and p2.rnk = 2 
where p1.rnk = 1 

這則返回

customer_id | Product #1 | Product #2 
------------+------------+----------- 
      1 | DVD  | Blu-ray 
      2 | DVD  |   

以上是標準的SQL和任何現代DBMS會工作。

在線例如:http://rextester.com/VAID15638

+0

真棒,感謝這幫助了很多,做一切我想做的事(學到了很多東西太)。乾杯。 – Morridini

相關問題