2017-01-26 36 views
9

記錄我有兩個表:如何刪除重複項,並更新引用這些重複的SQL

User:(int id, varchar unique username) 

Items: (int id, varchar name, int user_id) 

目前,有user表中不區分大小寫的重複,如:

1,John 
2,john 
3,sally 
4,saLlY 

然後項目表將有

1,myitem,1 
2,mynewitem,2 
3,my-item,3 
4,mynew-item,4 

我已經更新了插入到用戶表,以確保我的代碼t總是插入小寫字母。

不過,我需要遷移的數據庫中,以便重複從用戶表中刪除,並且項目表引用更新,以便遷移數據後,用戶不會失去自己的物品

IE訪問將是:

用戶:

1,john 
3,sally 

項目

1,myitem,1 
2,mynewitem,1 
3,my-item,3 
4,mynew-item,3 

由於用戶表具有唯一約束,我不能只將其設置爲較低的像

update public.user set username =lower(username) 
+0

我使用H2數據庫 – user171943

+0

首先更新項目,以便它們都指向用戶的正確版本,然後刪除不需要的用戶。 – Randy

+0

我可以在Java或其他編程語言中做到這一點,我想知道是否可以純粹使用SQL – user171943

回答

1

我不擅長H2。你可以嘗試寫這個SQL Server和數據庫區分大小寫,區分變音。

create table t_user(id int not null identity(1,1), username varchar(25) unique); 
alter table t_user add constraint pk_id_user primary key(id); 

create table t_items(id int not null identity(1,1), name varchar(25), user_id int); 
alter table t_items add constraint pk_id_items primary key(id); 
alter table t_items add constraint fk_user_id foreign key(user_id) references t_user(id); 

insert into t_user (username) values ('John'), ('john'), ('sally'), ('saLlY'); 
insert into t_items (name, user_id) values ('myitem', 1), ('mynewitem', 2), ('my-item', 3), ('mynew-item',4); 

select * from t_user 
select * from t_items 

create table t_user_mig(id int not null identity(1,1), username varchar(25) unique); 
alter table t_user_mig add constraint pk_id_user_mig primary key(id); 

create table t_items_mig(id int not null identity(1,1), name varchar(25), user_id int); 
alter table t_items_mig add constraint pk_id_items_mig primary key(id); 
alter table t_items_mig add constraint fk_user_id_mig foreign key(user_id) references t_user_mig(id); 

insert into t_user_mig select distinct lower(username) from t_user 
insert into t_items_mig 
select ti.name, (select id from t_user_mig where username = lower(tu.username)) 
from t_items ti, t_user tu 
where ti.user_id = tu.id 

select * from t_user_mig 
select * from t_items_mig 

我代替你的表用戶,通過T_USER項目,t_items。這些表格被遷移到t_user_mig,t_items_mig

您可以在H2中試用。我會感謝您的反饋。

我希望它能提供幫助。

1

更新項目第一:

update items 
set userid = u.userid 
from items i 
    inner join users u on i.iserid=u.userid 
    inner join (select userid, username, row_number() over (partition by username order by userid)) u2 on u2.username=u.username and rn=1 

然後創建新的基於用戶表的關原:

select userid, lower(username) username 
into NewUserTable 
from (select userid, username, row_number() over (partition by username order by userid)) u 
where rn=1 
+0

不確定H2是否使用窗口函數。我將把它作爲一個SQL Server解決方案。 – KeithL

1

下面的代碼用「H2 1.3.176(2014-04-05)/ embed ded模式「在Web控制檯上。有兩個問題應該可以解決您所述的問題,並且還有一個額外的準備說明可供您考慮 - 儘管數據中沒有顯示 - 也應該考慮。準備聲明將稍後解釋;讓我們先從主要的兩個疑問:

首先,所有items.userid旨意改寫爲那些對應較低的情況下,用戶名條目如下:讓我們把小寫的條目main和非小寫項dup。然後,每items.userid,這是指dup.id,將被設置爲相應的main.id。如果不區分大小寫比較它們的名稱,則主條目對應於dup條目,即main.name = lower(dup.name)

其次,用戶表中的所有複製條目都將被刪除。 dup條目是name <> lower(name)

到目前爲止的基本要求。此外,我們應該考慮對於某些用戶,可能只存在帶有大寫字母的條目,但沒有「小寫入口」。爲了處理這種情況,我們使用了一個準備語句,它爲每組通用名稱設置一個名稱,每個名稱中有一個名稱爲小寫。

drop table if exists usr; 

CREATE TABLE usr 
    (`id` int primary key, `name` varchar(5)) 
; 

INSERT INTO usr 
    (`id`, `name`) 
VALUES 
    (1, 'John'), 
    (2, 'john'), 
    (3, 'sally'), 
    (4, 'saLlY'), 
    (5, 'Mary'), 
    (6, 'mAry') 

; 

drop table if exists items; 

CREATE TABLE items 
    (`id` int, `name` varchar(10), `userid` int references usr (`id`)) 
; 

INSERT INTO items 
    (`id`, `name`, `userid`) 
VALUES 
    (1, 'myitem', 1), 
    (2, 'mynewitem', 2), 
    (3, 'my-item', 3), 
    (4, 'mynew-item', 4) 
; 

update usr set name = lower(name) where id in (select min(ui.id) as minid from usr ui where lower(ui.name) not in (select ui2.name from usr ui2) 
group by lower(name)); 

update items set userid = 
(select umain.id as mainid from usr udupl, usr umain 
where umain.name = lower(umain.name) 
    and lower(udupl.name) = lower(umain.name) 
    and udupl.id = userid 
); 

delete from usr where name <> lower(name); 

select * from usr; 

select * from items; 

上述表述的執行產生以下結果:

select * from usr; 
ID | NAME 
----|----- 
2 | john 
3 | sally 
5 | mary 

select * from items; 
ID | NAME  |USERID 
---|----------|------ 
1 |myitem | 2 
2 |mynewitem | 2 
3 |my-item | 3 
4 |mynew-item| 3 
2

如果你第一次更新正確的項目引用,那麼你就可以刪除用戶重複。在下面的例子中,我不停的用戶提供最小ID爲正確的,如果不打擾你

--Prepare data 
create TABLE #users 
(id int primary key, username varchar(15)); 

INSERT INTO #users 
(id, username) 
select 
1, 'John' 
union all select 
2, 'john' 
union all select 
3, 'sally' 
union all select 
4, 'saLlY' 
union all select 
5, 'Mary' 
union all select 
6, 'mAry' 


create TABLE #items 
(itemid int, name varchar(10), userid int references #users (id)); 

INSERT INTO #items 
(itemid, name, userid) 
select 
1, 'myitem', 1 
union all select 
2, 'mynewitem', 2 
union all select 
3, 'my-item', 3 
union all select 
4, 'mynew-item', 4 
; 

--Update items 
update #items 
set userid =minid 
from 
(
select minid,id from 
(
select min(id) as minid,lower(username) as newusername 
from #users group by username) t inner join #users 
on t.newusername = username) t2 inner join #items on t2.id = userid 


--delete duplicates users, according to minimum id 
delete from #users where id not in (
select min(id) from #users group by lower(username)) 

--set the remaining users names to lower 
update #users 
set username = lower(username) 

--Clean temp data 
drop table #users 
drop table #items 

這是在sqlserver的測試,但你要的是純淨的sql,所以我認爲它會服你

1

此代碼的工作完美SQL Server上

嘗試它,它會幫助你(你可能需要簡單的修改以符合您的數據庫引擎): -

SELECT U1.id,U2.id id2 
INTO #User_Tmp 
FROM User U1 JOIN User U2 
ON LOWER(U2.username) = LOWER(U1.username) 
AND U1.id < U2.id 

UPDATE It 
SET It.user_id = U.id 
FROM Items It 
JOIN #User_Tmp U 
ON U.id2 = It.id 

DELETE FROM User 
WHERE id IN 
(
    SELECT id2 FROM #User_Tmp 
) 

SELECT * 
FROM User 

SELECT * 
FROM Items 

DROP TABLE #User_Tmp; 

希望這方面的一個提出問題。

1
BEGIN TRAN 
CREATE TABLe #User (UserID Int, UserName Nvarchar(255)) 

INSERT INTO #USER 
SELECT 1,'John' UNION ALL 
SELECT 2,'John' UNION ALL 
SELECT 3,'sally' UNION ALL 
SELECT 4,'saLlY' 

CREATE TABLE #items 
(itemid int, name varchar(10), userid int); 

INSERT INTO #items 
(itemid, name, userid) 
select 
1, 'myitem', 1 
union all select 
2, 'mynewitem', 2 
union all select 
3, 'my-item', 3 
union all select 
4, 'mynew-item', 4 

GO 
WITH CTE (USERID, DuplicateCount) 
AS 
(
    SELECT UserName, 
    ROW_NUMBER() OVER(PARTITION BY UserName 
    ORDER BY UserName) AS DuplicateCount 
    FROM #User 

) 
Delete from CTE Where DuplicateCount > 1 

Select * from #User 

Select * from #items 

ROLLBACK TRAN 
+0

刪除重複記錄後,您可以簡單地更新表格 –

1

嘗試使用此MERGE語句可以找出重複,也可以更新重複的值。

MERGE [INTO] <target table>

USING <source table or table expression>

ON <join/merge predicate> (semantics similar to outer join)

WHEN MATCHED <statement to run when match found in target>

WHEN [TARGET] NOT MATCHED <statement to run when no match found in target>