2010-05-18 84 views
2

我有一個大型的電子郵件發送數據集和狀態代碼。獲取最新的行,按列分組

ID Recipient   Date  Status 
1 [email protected] 01/01/2010  1 
2 [email protected] 02/01/2010  1 
3 [email protected] 01/01/2010  1 
4 [email protected] 02/01/2010  2 
5 [email protected] 03/01/2010  1 
6 [email protected] 01/01/2010  1 
7 [email protected] 02/01/2010  2 

在這個例子中:

  • 發送到有人所有郵件都發送到他們
  • 中間電子郵件(按日期)的狀態有一個狀態,但最新的是
  • 發送到別人的最後一封電子郵件有

狀態我需要找回被髮送到每個人的所有電子郵件的數量,以及什麼最新狀態代碼了。

第一部分是相當簡單:

SELECT Recipient, Count(*) EmailCount 
FROM Messages 
GROUP BY Recipient 
ORDER BY Recipient 

這給了我:

Recipient   EmailCount 
[email protected] 2 
[email protected] 3 
[email protected] 2 

我怎樣才能獲得最新的狀態代碼呢?

最終的結果應該是:

Recipient   EmailCount LastStatus 
[email protected]   2   1 
[email protected]    3   1 
[email protected]   2   2 

感謝。

(服務器是Microsoft SQL Server 2008中,查詢是通過一個OleDbConnection在.NET平臺上運行)

+1

是否可以同時收到多個電子郵件?您想如何處理兩封電子郵件具有相同日期但狀態不同的情況? – 2010-05-18 16:18:28

+0

時間戳實際上是這個足夠高的分辨率不會是一個問題,即使是如此,「無論SQL將其ORDER BY回報」是不夠好。 – Cylindric 2010-05-18 16:44:27

回答

4

這是一個「每組最大」的一個例子查詢。我認爲通過將其分解成兩個子查詢並加入結果是最容易理解的。

第一個子查詢就是你已經擁有的。

第二子查詢使用窗函數ROW_NUMBER與數每個收件人的電子郵件從1開始對最近,則2,3,等...

從所述第一查詢然後與接合結果來自第二個查詢的行號爲1的結果,即最近的。這樣做可以保證在有關係的情況下,每個收件人只能得到一行。

下面是該查詢:

SELECT T1.Recipient, T1.EmailCount, T2.Status FROM 
(
    SELECT Recipient, COUNT(*) AS EmailCount 
    FROM Messages 
    GROUP BY Recipient 
) T1 
JOIN 
(
    SELECT 
     Recipient, 
     Status, 
     ROW_NUMBER() OVER (PARTITION BY Recipient ORDER BY Date Desc) AS rn 
    FROM Messages 
) T2 
ON T1.Recipient = T2.Recipient AND T2.rn = 1 

這得出以下結果:

Recipient   EmailCount Status 
[email protected] 2   2  
[email protected] 2   1  
[email protected]  3   1  
+0

非常好!非常感謝你。 – Cylindric 2010-05-18 17:06:57

0

您可以使用排序功能這一點。喜歡的東西(未測試):

WITH MyResults AS 
(
    SELECT Recipient, Status, ROW_NUMBER() OVER(Recipient ORDER BY ( [date] DESC)) AS [row_number] 
    FROM Messages 
) 
SELECT MyResults.Recipient, MyCounts.EmailCount, MyResults.Status 
FROM (
    SELECT Recipient, Count(*) EmailCount 
    FROM Messages 
    GROUP BY Recipient 
) MyCounts 
INNER JOIN MyResults 
ON MyCounts.Recipient = MyResults.Recipient 
WHERE MyResults.[row_number] = 1 
2

這不是很漂亮,但我可能只是用了幾個子查詢的:

SELECT Recipient, 
    COUNT(*) EmailCount, 
    (SELECT Status 
    FROM Messages M2 
    WHERE Recipient = M.Recipient 
     AND Date = (SELECT MAX(Date) 
        FROM Messages 
        WHERE Recipient = M2.Recipient)) 
FROM Messages M 
GROUP BY Recipient 
ORDER BY Recipient 
2
SELECT 
    M.Recipient, 
    C.EmailCount, 
    M.Status 
FROM 
    (
    SELECT Recipient, Count(*) EmailCount 
    FROM Messages 
    GROUP BY Recipient 
    ) C 
    JOIN 
    (
    SELECT Recipient, MAX(Date) AS LastDate 
    FROM Messages 
    GROUP BY Recipient 
    ) MD ON C.Recipient = MD.Recipient 
    JOIN 
    Messages M ON MD.Recipient = M.Recipient AND MD.LastDate = M.Date 
ORDER BY 
    Recipient 

我發現聚集大多規模更好,然後排名函數

+0

+1我的經驗也是。爲了減少可讀性,提高性能:排序功能 - >聚集 - >交叉應用與CTE。 – Andomar 2010-05-18 16:59:27

1

,你不能輕易這是否是單個查詢,因爲count(*)是一個組函數,而最新的狀態來自一個sp ecific排。以下是查詢以獲取每個用戶的最新狀態:

SELECT M.Recipient, M.Status FROM Messages M 
WHERE M.Date = (SELECT MAX(SUB.Date) FROM MESSAGES SUB 
    WHERE SUB.Recipient = M.Recipient)