識別，刪除重複項

我必須清理由於應用程序代碼不正確而導致重複結束的數據庫。識別，刪除重複項

爲了獲得必要的數據，我加入了包含測驗用戶，問題和答案的表格。這給了我：

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated    | MaxAnswers 
-------------------------------------------------------------------------------- 
17  | 17   | 374526 | 65  | 2014-01-21 16:08:00.057 | 3 
17  | 17   | 3497  | 61  | NULL     | 3 
17  | 17   | 3498  | 69  | NULL     | 3 
17  | 17   | 3499  | 70  | NULL     | 3 
17  | 17   | 3500  | 72  | NULL     | 3 
17  | 17   | 4071  | 62  | NULL     | 3 
17  | 17   | 4072  | 63  | NULL     | 3 
17  | 17   | 258050 | 64  | NULL     | 3 
17  | 43   | 4059  | 210  | NULL     | 1 
17  | 43   | 4060  | 210  | NULL     | 1 
17  | 110  | 533242 | 12  | NULL     | 2 
17  | 110  | 536466 | 12  | NULL     | 2 
17  | 110  | 577857 | 12  | 2015-09-24 09:13:15.127 | 2

我必須保持每個Question每User頂X的答案，其中X是MaxAnswer，通過LastUpdated DESC有序？ AnswerID DESC，並刪除其餘的 - 除非ChoiceId來了不止一次，在這種情況下只保留其中一個ChoiceId。對於給定QuestionId,MaxAnswer總是相同的。

我目前有上面的選擇（注：在上面的數據示例中，我有AnswerId ASC，它已被更正），但我不知道該怎麼走（我假設使用partition？）。

編輯：此樣品預計產出將是：

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated    | MaxAnswers 
-------------------------------------------------------------------------------- 
17  | 17   | 374526 | 65  | 2014-01-21 16:08:00.057 | 3 
17  | 17   | 258050 | 64  | NULL     | 3 
17  | 17   | 4072  | 63  | NULL     | 3 
17  | 43   | 4060  | 210  | NULL     | 1 
17  | 110  | 577857 | 12  | 2015-09-24 09:13:15.127 | 2

來源

2015-10-05 tsc

你嘗試了什麼？ – MusicLovingIndianGirl

你介意發佈上述示例表數據的預期輸出嗎？ – Wanderer

@AishvaryaKarthik：這就像我一直在用我非常有限的SQL服務器一樣。我正在考慮做一個'DELETE FROM Answers WHERE AnswerId NOT IN（<上面select>的結果）''。我假設我必須使用分區或類似的分區，但我不熟悉它們。 – tsc

請嘗試以下代碼

;with cte as (
    select 
     *, 
     rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc) 
    from UserAnswers 
) 
delete UserAnswers 
from UserAnswers u 
inner join cte 
    on u.UserId = cte.UserId and 
     u.QuestionId = cte.QuestionId and 
     u.AnswerId = cte.AnswerId 
where cte.rn > cte.MaxAnswers

您也可以參考下面的SQL教程，其中SQL Row_Number() function is used to delete duplicate rows

這是測試

create table UserAnswers (
UserId int, QuestionId int, AnswerId int, ChoiceId int, LastUpdated datetime, MaxAnswers int 
) 
insert into UserAnswers select 17  , 17   , 374526 , 65  , '2014-01-21 16:08:00.057' , 3 
insert into UserAnswers select 17  , 17   , 3497  , 61  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3498  , 69  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3499  , 70  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 3500  , 72  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 4071  , 62  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 4072  , 63  , NULL  , 3 
insert into UserAnswers select 17  , 17   , 258050 , 64  , NULL  , 3 
insert into UserAnswers select 17  , 43   , 4059  , 210  , NULL  , 1 
insert into UserAnswers select 17  , 43   , 4060  , 210  , NULL  , 1 
insert into UserAnswers select 17  , 110  , 533242 , 12  , '2015-09-24 09:13:15.127' , 2

來源

2015-10-05 06:10:03 Eralper

識別，刪除重複項

回答

相關問題