1
我有一個簡單的SQL問題,我無法解決(我正在使用Amazon Redshift)。排除Redshift中後面的重複記錄
比方說,我有下面的例子:
id, type, channel, date, column1, column2, column3, column4
1, visit, seo, 07/08/2017: 11:11:22
1, hit, seo, 07/08/2017: 11:12:34
1, hit, seo, 07/08/2017: 11:13:22
1, visit, sem, 07/08/2017: 11:15:11
1, scarf, display, 07/08/2017: 11:15:45
1, hit, display, 07/08/2017: 11:15:37
1, hit, seo, 07/08/2017: 11:18:22
1, hit, display 07/08/2017: 11:18:23
1, hit, referal 07/08/2017: 11:19:55
我想選擇的所有訪問(這在我的邏輯表對應於與特定ID每一行的開始,並排除「通道」重複,經過對方來的人,我的例子應該返回:
1, visit, seo, 07/08/2017: 11:11:22
**1, hit, seo, 07/08/2017: 11:12:34** (exclude because it follows seo and it's not a visit)
**1, hit, seo, 07/08/2017: 11:13:22** (exclude because it follows seo and it's not a visit)
1, visit, sem, 07/08/2017: 11:15:11 (include, new channel)
1, scarf, display, 07/08/2017: 11:15:45 (include, new channel)
**1, hit, display, 07/08/2017: 11:15:37** (exclude because it follows display and it's not a visit)
1, hit, seo, 07/08/2017: 11:18:22 (include because it doesn't follow seo directly, even if seo is already present)
1, hit, display 07/08/2017: 11:18:23 ((include because it doesn't follow display directly, even if display is already present)
1, hit, referal 07/08/2017: 11:19:55 (include, new channel)
我用行號嘗試(因爲我紅移工作):
select type, date, id, ROW_NUMBER() OVER (PARTITION BY id, channel ORDER BY date) as rn
,然後添加一個過濾器:
Where type='visit' or rn=1
但是,這並不能解決問題,因爲它不會返回的第7和第8行:
1, hit, seo, 07/08/2017: 11:18:22 (will be rn=4 for 'id=1, channel=seo' combination)
1, hit, display 07/08/2017: 11:18:23 (will be rn=3 for 'id=1, channel=display' combination)
誰能給我請指示等等我能解決問題嗎?
@FuzzyTree您好,我不知道這個窗口的功能,但它清楚地解決我的問題。非常感謝(y) – Amine