2012-06-21 162 views
5

我在批量插入腳本中犯了一個錯誤,所以現在我有不同colX的「重複」行。我需要刪除這個重複的行,但我不知道如何。更精確地說,我有這樣的:在SQL Server 2010中刪除「重複」行

col1 | col2 | col3 | colX  
----+---------------------- 
    0 | 1 | 2 | a 
    0 | 1 | 2 | b 
    0 | 1 | 2 | c 
    0 | 1 | 2 | a 
    3 | 4 | 5 | x 
    3 | 4 | 5 | y 
    3 | 4 | 5 | x 
    3 | 4 | 5 | z 

,我想保留每個第一次出現(行,COLX):

col1 | col2 | col3 | colX  
----+---------------------- 
    0 | 1 | 2 | a 
    3 | 4 | 5 | x 

謝謝您的答覆:)

+2

數據庫表中都沒有排順序的概念。你想按min(colX)排序並保留這些行嗎?該行上是否有時間戳列? –

+3

您使用的是哪個版本的SQL Server?據我所知,沒有SQL Server 2010. –

+0

如果你有'0 | 1 | 3 |你的數據應該保存嗎?還是應該刪除? –

回答

10

嘗試使用SQL Server的CTE最簡單的方法:http://www.sqlfiddle.com/#!3/2d386/2

數據:

CREATE TABLE tbl 
    ([col1] int, [col2] int, [col3] int, [colX] varchar(1)); 

INSERT INTO tbl 
    ([col1], [col2], [col3], [colX]) 
VALUES 
    (0, 1, 2, 'a'), 
    (0, 1, 2, 'b'), 
    (0, 1, 2, 'c'), 
    (0, 1, 2, 'a'), 
    (3, 4, 5, 'x'), 
    (3, 4, 5, 'y'), 
    (3, 4, 5, 'x'), 
    (3, 4, 5, 'z'); 

解決方案:

select * from tbl; 

with a as 
(
    select row_number() over(partition by col1 order by col2, col3, colX) as rn 
    from tbl 
) 
delete from a where rn > 1; 

select * from tbl; 

Outpu T:

| COL1 | COL2 | COL3 | COLX | 
----------------------------- 
| 0 | 1 | 2 | a | 
| 0 | 1 | 2 | b | 
| 0 | 1 | 2 | c | 
| 0 | 1 | 2 | a | 
| 3 | 4 | 5 | x | 
| 3 | 4 | 5 | y | 
| 3 | 4 | 5 | x | 
| 3 | 4 | 5 | z | 


| COL1 | COL2 | COL3 | COLX | 
----------------------------- 
| 0 | 1 | 2 | a | 
| 3 | 4 | 5 | x | 

或許這樣的:http://www.sqlfiddle.com/#!3/af826/1

數據:

CREATE TABLE tbl 
    ([col1] int, [col2] int, [col3] int, [colX] varchar(1)); 

INSERT INTO tbl 
    ([col1], [col2], [col3], [colX]) 
VALUES 
    (0, 1, 2, 'a'), 
    (0, 1, 2, 'b'), 
    (0, 1, 2, 'c'), 
    (0, 1, 2, 'a'), 
    (0, 1, 3, 'a'), 
    (3, 4, 5, 'x'), 
    (3, 4, 5, 'y'), 
    (3, 4, 5, 'x'), 
    (3, 4, 5, 'z'); 

解決方案:

select * from tbl; 


with a as 
(
    select row_number() over(partition by col1, col2, col3 order by colX) as rn 
    from tbl 
) 
delete from a where rn > 1; 

select * from tbl; 

輸出:

| COL1 | COL2 | COL3 | COLX | 
----------------------------- 
| 0 | 1 | 2 | a | 
| 0 | 1 | 2 | b | 
| 0 | 1 | 2 | c | 
| 0 | 1 | 2 | a | 
| 0 | 1 | 3 | a | 
| 3 | 4 | 5 | x | 
| 3 | 4 | 5 | y | 
| 3 | 4 | 5 | x | 
| 3 | 4 | 5 | z | 

| COL1 | COL2 | COL3 | COLX | 
----------------------------- 
| 0 | 1 | 2 | a | 
| 0 | 1 | 3 | a | 
| 3 | 4 | 5 | x | 
+0

這樣做了,非常感謝 –

2

如果您有很多重複項,我會建議使用CTE並在單獨的表中讀取所有非dup記錄。然而,有一個推薦職位遵循:MSDN

+1

看起來你是第一次提到sql「CTE」方法,這是最簡單的,並且在大多數場景下都可以工作。 –

1

假設COLX是唯一的(這是不是在你的榜樣的情況下,即使你說:「不同的COLX」),你可以使用以下命令來刪除重複項:

;with cteDuplicates as 
(
    select 
     *, 
     row_number() over (partition by col1, col2, col3 order by colX) as ID 
    from Duplicates 
) 
delete D from Duplicates D 
    inner join cteDuplicates C on C.colX = D.Colx 
where ID > 1 

(假設你的表被命名爲「重複」)

如果COLX不是唯一的,添加一個新的唯一標識符列,插入不同的值到它,然後通過加入該列,而不是使用上面的代碼COLX。

2

如果你是隻是保持COLX的最小值OK,你可以這樣做:

delete t from t inner join 
    (select min(colx) mincolx, col1, col2, col3 
    from t 
    group by col1, col2, col3 
    having count(1) > 1) as duplicates 
    on (duplicates.col1 = t.col1 
    and duplicates.col2 = t.col2 
    and duplicates.col3 = t.col3 
    and duplicates.mincolx <> t.colx) 

的問題是,你仍然有所有四列都是相同的行。 爲了擺脫這些,運行第一個查詢後,您必須使用臨時表。

SELECT distinct col1, col2, col3, colx 
INTO temp 
    FROM (SELECT col1, col2, col3 
     from t 
     group by col1, col2, col3 
     having count(1) > 1) subq; 

DELETE from t where exists 
    (select 1 from temp 
    where temp.col1 = t.col1 
     and temp.col2 = t.col2 
     and temp.col3 = t.col3); 

Here's an example SQLFiddle.

0

我假設你正在使用SQL Server 2005/2008.

SELECT col1, 
     col2, 
     col3, 
     colx 
FROM 
    (SELECT *, 
      row_number() OVER (PARTITION BY col1,col2,col3 
          ORDER BY colx) AS r 
    FROM table_name) a 
WHERE r = 1; 
0

最簡單的解決辦法是如下 假設我們有表刪除emp_dept(EMPID,DEPTID),其中有重複的行, 在Oracle數據庫

delete from emp_dept where exists (select * from emp_dept i where i.empid = emp_dept.empid and i.deptid = emp_dept.deptid and i.rowid < emp_dept.rowid) 

對於不支持row id的功能的sql server或anydatabase,我們需要添加identity列來標識每一行。 說,我們增加了NID身份表

alter table emp_dept add nid int identity(1,1) -- to add identity column 

現在查詢刪除重複可以寫成

delete from emp_dept where exists (select * from emp_dept i where i.empid = emp_dept.empid and i.deptid = emp_dept.deptid and i.nid< emp_dept.nid) 

這裏的概念是刪除所有行對其存在具有類似其他行核心價值觀,但更小的rowid或身份。因此,如果存在重複的行,那麼具有較高行ID或標識的行將被刪除。對於行沒有重複它找不到更低的行ID因此不會被刪除。

0

在你自己的風險試試這個代碼 BT

Delete from Table_name 
WHERE Table_name.%%physloc%% 
     NOT IN (SELECT MAX(b.%%physloc%%) 
       FROM Table_name b 
       group by Col_1,Col_2) 

第二種方法使用ROW_NUMBER(),這是安全的方法

WITH CTE_Dup AS 
(

SELECT * ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo) 
AS ROW_NO 
from dbo.SalesOrderDetails 
) 
Delete FROM CTE_Dup;