我目前正在研究DataImport腳本,該腳本旨在將數據從一個數據庫移動到另一個數據庫。我遇到的主要問題是所涉及的表格包含大量重複記錄,重複字段爲產品代碼,語言,立法,品牌名稱,公式和版本,即我們可能在數據庫中具有以下內容:SQL Server 2005--刪除重複記錄,同時保留第一個記錄
我的測試產品,英語,英國,測試品牌,測試式,1(ID 1 - 不包括在組)
我的測試產品,英語,英國,測試品牌,測試式,1(ID 2 - 不包括在組裏)
我的測試產品,英國,英國,測試品牌,測試配方,1(ID 3 - 不包括在組中)
我的測試產品, 1(ID 4 - 不包括在組中)
正如你所看到的,這些記錄在各方面都是相同的。我的問題是,作爲數據加載腳本的一部分,我希望刪除ID爲1,2和3的記錄,同時保留ID爲4的記錄,因爲這將是最新的記錄,因此是一個我想保留。要做到這一點,我已經寫了T-SQL腳本如下:
-- get the list of items where there is at least one duplicate
DECLARE cDuplicateList CURSOR FOR
SELECT productcode, languageid, legislationid, brandName, versionnumber, formulaid
FROM allproducts
GROUP BY productcode, languageid, legislationid, brandName, versionnumber, formulaid
HAVING COUNT (*) > 1
OPEN cDuplicateList
FETCH cDuplicateList INTO @productCode, @languageId, @legislationId, @brandName, @versionNumber, @formulaId
-- while there are still duplicates
WHILE @@FETCH_STATUS=0
BEGIN
-- delete from the table where the product ID is in the sub-query, which contains all
-- of the records apart from the last one
DELETE FROM AllProducts
WHERE productId IN
(
SELECT productId
FROM allProducts
WHERE productCode = @productCode
AND (languageId = @languageId OR @languageId IS NULL)
AND (legislationId = @legislationId OR @legislationId IS NULL)
AND (brandName = @brandName OR @brandName IS NULL)
AND (versionNumber = @versionNumber OR @versionNumber IS NULL)
AND (formulaId = @formulaId OR @formulaId IS NULL)
EXCEPT
SELECT TOP 1 productId
FROM allProducts
WHERE productCode = @productCode
AND (languageId = @languageId OR @languageId IS NULL)
AND (legislationId = @legislationId OR @legislationId IS NULL)
AND (brandName = @brandName OR @brandName IS NULL)
AND (versionNumber = @versionNumber OR @versionNumber IS NULL)
AND (formulaId = @formulaId OR @formulaId IS NULL)
)
FETCH cDuplicateList INTO @productCode, @languageId, @legislationId, @brandName, @versionNumber, @formulaId
END
現在,這樣做的工作 - 它只是慢得令人難以置信,我想不出任何簡單的方法,使其更快。有沒有人有任何想法,我如何維護相同的功能,但使其運行速度更快?如果你想看到你要刪除什麼
WITH CTE AS
(
SELECT ProductCode, Language, Legislation, BrandName, Formula, Version,
RN = ROW_NUMBER()
OVER (
PARTITION BY productcode, language, legislation, brandname, formula, version
ORDER BY id DESC)
FROM dbo.Students
)
DELETE FROM CTE WHERE RN > 1
變化DELETE
到SELECT *
:
可能的重複[如何刪除重複的行?](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) –