2015-10-05 175 views
0

我必須清理由於應用程序代碼不正確而導致重複結束的數據庫。識別,刪除重複項

爲了獲得必要的數據,我加入了包含測驗用戶,問題和答案的表格。這給了我:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated    | MaxAnswers 
-------------------------------------------------------------------------------- 
17  | 17   | 374526 | 65  | 2014-01-21 16:08:00.057 | 3 
17  | 17   | 3497  | 61  | NULL     | 3 
17  | 17   | 3498  | 69  | NULL     | 3 
17  | 17   | 3499  | 70  | NULL     | 3 
17  | 17   | 3500  | 72  | NULL     | 3 
17  | 17   | 4071  | 62  | NULL     | 3 
17  | 17   | 4072  | 63  | NULL     | 3 
17  | 17   | 258050 | 64  | NULL     | 3 
17  | 43   | 4059  | 210  | NULL     | 1 
17  | 43   | 4060  | 210  | NULL     | 1 
17  | 110  | 533242 | 12  | NULL     | 2 
17  | 110  | 536466 | 12  | NULL     | 2 
17  | 110  | 577857 | 12  | 2015-09-24 09:13:15.127 | 2 

我必須保持每個QuestionUser頂X的答案,其中XMaxAnswer,通過LastUpdated DESC有序? AnswerID DESC,並刪除其餘的 - 除非ChoiceId來了不止一次,在這種情況下只保留其中一個ChoiceId。 對於給定QuestionId,MaxAnswer總是相同的。

我目前有上面的選擇(注:在上面的數據示例中,我有AnswerId ASC,它已被更正),但我不知道該怎麼走(我假設使用partition?)。

編輯:此樣品預計產出將是:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated    | MaxAnswers 
-------------------------------------------------------------------------------- 
17  | 17   | 374526 | 65  | 2014-01-21 16:08:00.057 | 3 
17  | 17   | 258050 | 64  | NULL     | 3 
17  | 17   | 4072  | 63  | NULL     | 3 
17  | 43   | 4060  | 210  | NULL     | 1 
17  | 110  | 577857 | 12  | 2015-09-24 09:13:15.127 | 2 
+0

你嘗試了什麼? – MusicLovingIndianGirl

+0

你介意發佈上述示例表數據的預期輸出嗎? – Wanderer

+0

@AishvaryaKarthik:這就像我一直在用我非常有限的SQL服務器一樣。我正在考慮做一個'DELETE FROM Answers WHERE AnswerId NOT IN(<上面select>的結果)''。我假設我必須使用分區或類似的分區,但我不熟悉它們。 – tsc

回答

3

請嘗試以下代碼

;with cte as (
    select 
     *, 
     rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc) 
    from UserAnswers 
) 
delete UserAnswers 
from UserAnswers u 
inner join cte 
    on u.UserId = cte.UserId and 
     u.QuestionId = cte.QuestionId and 
     u.AnswerId = cte.AnswerId 
where cte.rn > cte.MaxAnswers 

您也可以參考下面的SQL教程,其中SQL Row_Number() function is used to delete duplicate rows

這是測試

create table UserAnswers (
UserId int, QuestionId int, AnswerId int, ChoiceId int, LastUpdated datetime, MaxAnswers int 
) 
insert into UserAnswers select 17  , 17   , 374526 , 65  , '2014-01-21 16:08:00.057' , 3 
insert into UserAnswers select 17  , 17   , 3497  , 61  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3498  , 69  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3499  , 70  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3500  , 72  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 4071  , 62  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 4072  , 63  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 258050 , 64  , NULL  , 3 
insert into UserAnswers select 17  , 43   , 4059  , 210  , NULL  , 1 
insert into UserAnswers select 17  , 43   , 4060  , 210  , NULL  , 1 
insert into UserAnswers select 17  , 110  , 533242 , 12  , '2015-09-24 09:13:15.127' , 2