2016-04-06 64 views
6

我有一個客戶,用戶和收入相似的表到以下(在現實中成千上萬的記錄):選擇超過總值的百分比行的一個子集

Customer User Revenue 
001  James 500 
002  James 750 
003  James 450 
004  Sarah 100 
005  Sarah 500 
006  Sarah 150 
007  Sarah 600 
008  James 150 
009  James 100 

我想要做的是僅返回佔用戶總收入80%的最高消費客戶。

要手動爲此,我將下令詹姆斯的客戶通過他們的收入,計算出的總百分比和一個正在運行的總百分比,然後只返回最多記錄點正在運行的總點擊數的80%:

Customer User Revenue  % of total Running Total % 
002   James 750   0.38  0.38 
001   James 500   0.26  0.64 
003   James 450   0.23  0.87 <- Greater than 80%, last record 
008   James 150   0.08  0.95 
009   James 100   0.05  1.00 

我試過使用CTE,但到目前爲止都出現了空白。有沒有辦法通過單個查詢來完成此操作,而不是在Excel工作表中手動執行此操作?

回答

6

SQL Server 2012+只有

你可以使用窗口SUM

WITH cte AS 
(
    SELECT *, 
      1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile, 
      1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC) 
       /SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile 
    FROM tab 
) 
SELECT * 
FROM cte 
WHERE running_percentile <= 0.8; 

LiveDemo


的SQL Server 2008:

WITH cte AS 
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn 
    FROM t  
), cte2 AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM cte c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User] 
      AND c2.rn <= c.rn) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT * 
FROM cte2 
WHERE running_percentile <= 0.8; 

LiveDemo2

輸出:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ 
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║ 
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ 
║  2 ║ James ║  750 ║ 0,384615384615 ║ 0,384615384615  ║ 
║  1 ║ James ║  500 ║ 0,256410256410 ║ 0,641025641025  ║ 
║  7 ║ Sarah ║  600 ║ 0,444444444444 ║ 0,444444444444  ║ 
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

編輯2:

那看起來差不多是這樣,唯一的小鬼是它缺少最後一排, 詹姆斯的第三排需要他超過0.80,但需要包括在內。

WITH cte AS 
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn 
    FROM t  
), cte2 AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM cte c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User] 
      AND c2.rn <= c.rn) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM cte c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT a.* 
FROM cte2 a 
CROSS APPLY (SELECT MIN(running_percentile) AS rp 
      FROM cte2 
      WHERE running_percentile >= 0.8 
       AND cte2.[User] = a.[User]) AS s 
WHERE a.running_percentile <= s.rp; 

LiveDemo3

輸出:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ 
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║ 
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ 
║  2 ║ James ║  750 ║ 0,384615384615 ║ 0,384615384615  ║ 
║  1 ║ James ║  500 ║ 0,256410256410 ║ 0,641025641025  ║ 
║  3 ║ James ║  450 ║ 0,230769230769 ║ 0,871794871794  ║ 
║  7 ║ Sarah ║  600 ║ 0,444444444444 ║ 0,444444444444  ║ 
║  5 ║ Sarah ║  500 ║ 0,370370370370 ║ 0,814814814814  ║ 
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

看起來是完美的,翻譯成我的大桌子和RET呃,我需要什麼,花了好5分鐘,通過它,仍然不能遵循你所做的!

SQL Server 2008不支持一切OVER()條款,但ROW_NUMBER一樣。

第一CTE只是計算一組中的位置:

╔═══════════╦════════╦══════════╦════╗ 
║ Customer ║ User ║ Revenue ║ rn ║ 
╠═══════════╬════════╬══════════╬════╣ 
║  2 ║ James ║  750 ║ 1 ║ 
║  1 ║ James ║  500 ║ 2 ║ 
║  3 ║ James ║  450 ║ 3 ║ 
║  8 ║ James ║  150 ║ 4 ║ 
║  9 ║ James ║  100 ║ 5 ║ 
║  7 ║ Sarah ║  600 ║ 1 ║ 
║  5 ║ Sarah ║  500 ║ 2 ║ 
║  6 ║ Sarah ║  150 ║ 3 ║ 
║  4 ║ Sarah ║  100 ║ 4 ║ 
╚═══════════╩════════╩══════════╩════╝ 

第二CTE:

  • c2ROW_NUMBER
  • c3運行基於秩總子查詢計算每位使用者
  • 計算滿總和

在最終查詢s子查詢中查找總計超過80%的最低running

EDIT 3:

使用ROW_NUMBER實際上是冗餘的。

WITH cte AS 
(
    SELECT c.Customer, c.[User], c.[Revenue] 
      ,percentile   = 1.0 * Revenue/NULLIF(c3.s,0) 
      ,running_percentile = 1.0 * c2.s /NULLIF(c3.s,0) 
    FROM t c 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM t c2 
      WHERE c.[User] = c2.[User] 
      AND c2.Revenue >= c.Revenue) c2 
    CROSS APPLY 
     (SELECT SUM(Revenue) AS s 
      FROM t c2 
      WHERE c.[User] = c2.[User]) AS c3 
) 
SELECT a.* 
FROM cte a 
CROSS APPLY (SELECT MIN(running_percentile) AS rp 
      FROM cte c2 
      WHERE running_percentile >= 0.8 
       AND c2.[User] = a.[User]) AS s 
WHERE a.running_percentile <= s.rp 
ORDER BY [User], Revenue DESC; 

LiveDemo4

+1

@bendataclear請參閱更新 – lad2025

+0

看起來接近那裏,唯一的缺點是它缺少最後一排,詹姆斯的第三排讓他超過0.80,但需要包括在內。如果這不可能,但這不是災難。 – bendataclear

+1

@bendataclear添加了:) – lad2025

0

在SQL Server 2012+,你會使用累積總和 - 高效得多。在SQL Server 2008中,你可以使用相關子查詢或cross apply做到這一點:

select t.*, 
     sum(t.Revenue*1.0)/sum(t.Revenue) over (partition by user) as [% of Total], 
     sum(RunningRevenue*1.0)/sum(t.Revenue) over (partition by user) as [Running Total %] 
from t cross apply 
    (select sum(Revenue) as RunningRevenue 
     from t t2 
     where t2.Revenue >= t.Revenue and t2.user = t.user 
    ) t2; 

注:*1.0是以防萬一Revenue存儲爲一個整數。 SQL Server會執行整數除法,這將在幾乎所有行上爲兩列返回0

編輯:

添加where user = 'James',如果你想只對詹姆斯的結果。

+0

'[Total of Total]]列似乎有效,但只對單個用戶而言,運行總數似乎已遍佈整個地方。 – bendataclear

+0

@bendataclear。 。 。你原來的問題只有一個用戶。對單個用戶的總計運行進行調整是微不足道的。比小夥子的答案簡單得多。 –

+0

't.Revenue'周圍的第一個'sum'是沒有必要的。它不會工作,因爲沒有「GROUP BY」(或者我錯過了某些東西)。第二'用戶'應該引用'[用戶]'否則你會得到錯誤。第三:'SUM OVER()'計算每個整體的百分比而不是每個用戶的百分比。並沒有過濾。 – lad2025