2013-05-29 82 views
0

我有一個oracle表,沒有任何pk設置爲其他一些原因。它有5列,我希望能夠刪除重複的記錄(如果5列值是相同的,它們是重複的)。我想出了這個SQL,但看起來這不是拿起重複值:從oracle表中選擇重複值

SELECT DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT 
FROM table_name 
GROUP BY DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT 

HAVING COUNT(*) > 1 

樣本記錄:

DATE_TIME     SITE                  RESPONSE_TIME AVAIL_PERCENT AGENT 
20-Apr-13 04.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 8.2610 100.00 45693 
20-Apr-13 10.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.2900 100.00 45693 
24-Apr-13 07.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7300 100.00 45693 
24-Apr-13 03.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7180 100.00 45693 
08-May-13 06.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.5970 100.00 45693 
20-May-13 01.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7910 100.00 45693 
25-Apr-13 01.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.3400 100.00 45693 
08-May-13 05.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 2.4410 100.00 45693 
09-May-13 01.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]   45693 
21-May-13 06.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.5480 100.00 45693 
23-Apr-13 02.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 10.7070 100.00 45693 
26-Apr-13 09.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 4.0070 100.00 45693 
26-Apr-13 03.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.9350 100.00 45693 
22-May-13 12.52.00.00 PM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 4.1760 100.00 45693 
23-Apr-13 02.53.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.9500 100.00 45693 
23-Apr-13 03.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.0480 100.00 45693 
23-Apr-13 04.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.7600 100.00 45693 

任何想法?

+1

你能給樣本記錄嗎? –

+1

你的sql看起來是正確的...你確定時間和所有的領域是完全重複的嗎? – sgeddes

+0

@sgeddes,當我運行上面的SQL,我得到了sampe輸出。它們不是重複的,所有的值都是不同的。 – user1471980

回答

1

您可以參考其rowid作爲僞主鍵,並運行刪除行諸如查詢:

delete from 
    my_table 
where 
    rowid not in (
    select min(rowid) 
    from  my_table 
    group by column_1, 
      column_2, 
      column_3, 
      etc) 

的COLUMN_1等是組定義的唯一性的行列。

對於有大量重複數據的非常大的數據集,可能會有更好的執行選項,但這是一個通常就足夠的快速方法。

+0

沒有工作。在刪除之前,我做了一個選擇,並且看到回來的記錄不是重複的。 – user1471980

+0

你的意思是說你運行了子查詢並且發現它返回了沒有重複記錄的rowid? –

0

正如你在Oracle中,你可以嘗試以下操作來刪除重複項:

DELETE my_table WHERE ROWID IN 
(
    SELECT ROWID FROM 
    (
    SELECT 
    DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT, ROWID, 
    ROW_NUMBER() OVER (PARTITION BY 
     DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT ORDER BY DATE_TIME) ITM_IDX 
    FROM my_table 
) 
    WHERE ITM_IDX > 1 
); 
+0

是的,這是一個oracle表。我跑你選擇的聲明,我得到這個錯誤:「缺少這個功能的窗口規範」 – user1471980

+0

@ user1471980,請嘗試更新的SQL –

0

你打算創建一個主鍵? 您可以爲您的例外創建一個表,Oracle會將那些違反主鍵的記錄放在該表中。 如果存在違規行爲,主鍵本身不會被創建,但您可以在之後分析錯誤的數據。 =)

create table tb1 
(field1 number, field2 varchar2(100)); 

--good data 
insert into tb1 values (1, 'a'); 
insert into tb1 values (1, 'b'); 
insert into tb1 values (1, 'c'); 
insert into tb1 values (2, 'a'); 
insert into tb1 values (2, 'b'); 
insert into tb1 values (2, 'c'); 
-- bad data 
insert into tb1 values (3, 'a'); 
insert into tb1 values (3, 'a'); 
commit; 

-- a table for exceptions 
create table tbl_exceptions (row_id rowid, 
          owner varchar2(30), 
          table_name varchar2(30), 
          constraint varchar2(30)); 

-- the primary key 
-- if it fails, you have repeated registers 
alter table tb1 add constraint pk1 primary key (field1, field2) 
exceptions into tbl_exceptions; 

-- bad data will be here 
-- please notice its 'ROW_ID' from the second table 
select tb1.* 
from tb1, 
     tbl_exceptions 
where tb1.rowid = tbl_exceptions.row_id;