具有ID列的重複電子郵件地址

我的表格包含重複的電子郵件地址。每個電子郵件地址都有唯一的創建日期和唯一ID。我想識別具有最近創建日期及其關聯ID的電子郵件地址，並顯示具有其創建日期的重複ID。我想查詢以下列格式來顯示這一點：具有ID列的重複電子郵件地址

第1欄：EmailAddress的
第2欄：IDKeep
第3欄：CreateDateofIDKeep
第4欄：DuplicateID
第5欄： CreateDateofDuplicateID

注意：有些情況下存在2個以上重複的電子郵件地址。我希望查詢在新行上顯示每個附加重複項，在這些情況下重新聲明EmailAddress和IDKeep。

無濟於事我試圖拼湊在這裏找到的不同查詢。我目前處於虧損狀態 - 任何幫助/方向都將不勝感激。

來源

2015-04-02 sqlbg

剃刀SQL是一個接口，而不是一個數據庫。你用什麼數據庫？ – 2015-04-02 01:50:36

複雜的查詢最好通過將其分解成碎片並逐步工作來解決。

首先，讓我們創建一個查詢來找到我們想要保留該行的重點，通過查找最近的每封電子郵件創建日期然後加入拿到編號：

select x.Email, x.CreateDate, x.Id 
from myTable x 
join (
    select Email, max(CreateDate) as CreateDate 
    from myTable 
    group by Email 
) y on x.Email = y.Email and x.CreateDate = y.CreateDate

好了，現在讓我們做查詢得到重複的電子郵件地址：

select Email 
from myTable 
group by Email 
having count(*) > 1

，加入這個查詢回表以獲取有重複的每一行按鍵：

select x.Email, x.Id, x.CreateDate 
from myTable x 
join (
    select Email 
    from myTable 
    group by Email 
    having count(*) > 1 
) y on x.Email = y.Email

太好了。現在，所有剩下的就是加入第一個查詢與這一個讓我們的結果：

select keep.Email, keep.Id as IdKeep, keep.CreateDate as CreateDateOfIdKeep, 
    dup.Id as DuplicateId, dup.CreateDate as CreateDateOfDuplicateId 
from (
    select x.Email, x.CreateDate, x.Id 
    from myTable x 
    join (
     select Email, max(CreateDate) as CreateDate 
     from myTable 
     group by Email 
    ) y on x.Email = y.Email and x.CreateDate = y.CreateDate 
) keep 
join (
    select x.Email, x.Id, x.CreateDate 
    from myTable x 
    join (
     select Email 
     from myTable 
     group by Email 
     having count(*) > 1 
    ) y on x.Email = y.Email 
) dup on keep.Email = dup.Email and keep.Id <> dup.Id

注意最後keep.Id <> dup.Id謂詞的加入保證了我們沒有得到在同一行兩個keep和dup。

來源

2015-04-02 01:27:01

這真是太神奇了，我想給我的是我正在尋找的東西。唯一的問題是我可以在哪裏插入WHERE語句來刪除'Email'爲NULL的實例？ – sqlbg 2015-04-02 02:05:48

無論你真的想要什麼，儘管實際上這並不處理NULL情況，並且結果集中不會出現空值（如果任何一方爲NULL，'keep.Email = dup.Email'將爲空）。我認爲電子郵件字段不爲空。 – 2015-04-02 04:47:36

下面的子查詢使用技巧，以獲得最新的ID和創建日期爲每個電子郵件：

select Email, max(CreateDate) as CreateDate, 
     substring_index(group_concat(id order by CreateDate desc), ',', 1) as id 
from myTable 
group by Email 
having count(*) > 1;

的having()條款也保證了這只是重複的電子郵件。

然後，這個查詢只需要與數據的其餘部分，以獲得您想要的格式進行組合：

select t.Email, tkeep.id as keep_id, tkeep.CreateDate as keep_date, 
     id as dup_id, CreateDate as dup_CreateDate 
from myTable t join 
    (select Email, max(CreateDate) as CreateDate, 
      substring_index(group_concat(id order by CreateDate desc), ',', 1) as id 
     from myTable 
     group by Email 
     having count(*) > 1 
    ) tkeep 
    on t.Email = tkeep.Email and t.CreateDate <> tkeep.CreateDate;

來源

2015-04-02 01:58:11

具有ID列的重複電子郵件地址

回答

相關問題