4

我有一個表author_data發現有重複值的行一列

author_id | author_name 
----------+---------------- 
9   | ernest jordan 
14  | k moribe 
15  | ernest jordan 
25  | william h nailon 
79  | howard jason 
36  | k moribe 

現在我需要的結果爲:

author_id | author_name             
----------+---------------- 
9   | ernest jordan 
15  | ernest jordan  
14  | k moribe 
36  | k moribe 

也就是說,我需要的author_id有重複的名字出場。我試過這個說法:

select author_id,count(author_name) 
from author_data 
group by author_name 
having count(author_name)>1 

但它不工作。我怎樣才能得到這個?

回答

8

我的子查詢提出一個window function

SELECT author_id, author_name -- omit the name here, if you just need ids 
FROM (
    SELECT author_id, author_name 
     , count(*) OVER (PARTITION BY author_name) AS ct 
    FROM author_data 
    ) sub 
WHERE ct > 1; 

你會認識到基本聚合函數count()。它可以通過附加OVER子句變成窗口函數 - 就像任何其他聚合函數一樣。

這樣它就會對每個分區的行進行計數。瞧。

在舊版本中沒有的窗函數(v.8.3或以上) - 或一般 - 這種替代執行非常快:

SELECT author_id, author_name -- omit name, if you just need ids 
FROM author_data a 
WHERE EXISTS (
    SELECT 1 
    FROM author_data a2 
    WHERE a2.author_name = a.author_name 
    AND a2.author_id <> a.author_id 
    ); 

如果您關心的性能,在author_name添加索引。

1

你已經有一半了。您只需使用已識別的Author_IDs並獲取其餘數據。

試試這個..

SELECT author_id, author_name 
FROM author_data 
WHERE author_id in (select author_id 
     from author_data 
     group by author_name 
     having count(author_name)>1) 
1

你可以加入到表本身,這是可以實現用下面的查詢:

SELECT a1.author_id, a1.author_name 
FROM authors a1 
CROSS JOIN authors a2 
    ON a1.author_id <> a2.author_id 
    AND a1.author_name = a2.author_name; 

-- 9 |ernest jordan 
-- 15|ernest jordan 
-- 14|k moribe 
-- 36|k moribe 

--OR 

SELECT a1.author_id, a1.author_name 
FROM authors a1 
INNER JOIN authors a2 
    WHERE a1.author_id <> a2.author_id 
    AND a1.author_name = a2.author_name; 

-- 9 |ernest jordan 
-- 15|ernest jordan 
-- 14|k moribe 
-- 36|k moribe