2015-04-02 35 views
0

我的表格包含重複的電子郵件地址。每個電子郵件地址都有唯一的創建日期和唯一ID。我想識別具有最近創建日期及其關聯ID的電子郵件地址,並顯示具有其創建日期的重複ID。我想查詢以下列格式來顯示這一點:具有ID列的重複電子郵件地址

  • 第1欄:EmailAddress的
  • 第2欄:IDKeep
  • 第3欄:CreateDateofIDKeep
  • 第4欄:DuplicateID
  • 第5欄: CreateDateofDuplicateID

注意:有些情況下存在2個以上重複的電子郵件地址。我希望查詢在新行上顯示每個附加重複項,在這些情況下重新聲明EmailAddress和IDKeep。

無濟於事我試圖拼湊在這裏找到的不同查詢。我目前處於虧損狀態 - 任何幫助/方向都將不勝感激。

+0

剃刀SQL是一個接口,而不是一個數據庫。你用什麼數據庫? – 2015-04-02 01:50:36

回答

1

複雜的查詢最好通過將其分解成碎片並逐步工作來解決。

首先,讓我們創建一個查詢來找到我們想要保留該行的重點,通過查找最近的每封電子郵件創建日期然後加入拿到編號:

select x.Email, x.CreateDate, x.Id 
from myTable x 
join (
    select Email, max(CreateDate) as CreateDate 
    from myTable 
    group by Email 
) y on x.Email = y.Email and x.CreateDate = y.CreateDate 

好了,現在讓我們做查詢得到重複的電子郵件地址:

select Email 
from myTable 
group by Email 
having count(*) > 1 

,加入這個查詢回表以獲取有重複的每一行按鍵:

select x.Email, x.Id, x.CreateDate 
from myTable x 
join (
    select Email 
    from myTable 
    group by Email 
    having count(*) > 1 
) y on x.Email = y.Email 

太好了。現在,所有剩下的就是加入第一個查詢與這一個讓我們的結果:

select keep.Email, keep.Id as IdKeep, keep.CreateDate as CreateDateOfIdKeep, 
    dup.Id as DuplicateId, dup.CreateDate as CreateDateOfDuplicateId 
from (
    select x.Email, x.CreateDate, x.Id 
    from myTable x 
    join (
     select Email, max(CreateDate) as CreateDate 
     from myTable 
     group by Email 
    ) y on x.Email = y.Email and x.CreateDate = y.CreateDate 
) keep 
join (
    select x.Email, x.Id, x.CreateDate 
    from myTable x 
    join (
     select Email 
     from myTable 
     group by Email 
     having count(*) > 1 
    ) y on x.Email = y.Email 
) dup on keep.Email = dup.Email and keep.Id <> dup.Id 

注意最後keep.Id <> dup.Id謂詞的加入保證了我們沒有得到在同一行兩個keepdup

+0

這真是太神奇了,我想給我的是我正在尋找的東西。唯一的問題是我可以在哪裏插入WHERE語句來刪除'Email'爲NULL的實例? – sqlbg 2015-04-02 02:05:48

+0

無論你真的想要什麼,儘管實際上這並不處理NULL情況,並且結果集中不會出現空值(如果任何一方爲NULL,'keep.Email = dup.Email'將爲空)。我認爲電子郵件字段不爲空。 – 2015-04-02 04:47:36

0

下面的子查詢使用技巧,以獲得最新的ID和創建日期爲每個電子郵件:

select Email, max(CreateDate) as CreateDate, 
     substring_index(group_concat(id order by CreateDate desc), ',', 1) as id 
from myTable 
group by Email 
having count(*) > 1; 

having()條款也保證了這只是重複的電子郵件。

然後,這個查詢只需要與數據的其餘部分,以獲得您想要的格式進行組合:

select t.Email, tkeep.id as keep_id, tkeep.CreateDate as keep_date, 
     id as dup_id, CreateDate as dup_CreateDate 
from myTable t join 
    (select Email, max(CreateDate) as CreateDate, 
      substring_index(group_concat(id order by CreateDate desc), ',', 1) as id 
     from myTable 
     group by Email 
     having count(*) > 1 
    ) tkeep 
    on t.Email = tkeep.Email and t.CreateDate <> tkeep.CreateDate;