2016-12-02 51 views
0

我想編寫一個函數,在postgresql的指定列中標記重複項。在查詢中標記重複項的函數Postgresql

舉例來說,如果我有如下表:

country | landscape | household 
-------------------------------- 
TZA  | L01  | HH02 
TZA  | L01  | HH03 
KEN  | L02  | HH01 
RWA  | L03  | HH01 

我想能夠運行下面的查詢:

SELECT country, 
     landscape, 
     household, 
     flag_duplicates(country, landscape) AS flag 
FROM mytable 

,並得到以下結果:

country | landscape | household | flag 
--------------------------------------- 
TZA  | L01  | HH02  | duplicated 
TZA  | L01  | HH03  | duplicated 
KEN  | L02  | HH01  | 
RWA  | L03  | HH01  | 

在函數體內部,我想我需要類似於:

IF (country || landscape IN (SELECT country || landscape FROM mytable 
          GROUP BY country || landscape) 
    HAVING count(*) > 1) THEN 'duplicated' 
ELSE NULL 

但我很困惑如何通過所有這些作爲參數。我很感激幫助。我正在使用postgresql版本9.3。

回答

1

你不需要一個功能來完成。因爲性能,對結果集中的每一行使用函數並不是一個好主意。一種更好的解決方案是使用純SQL(即使是使用子查詢),併爲數據庫引擎提供優化它的機會。在你的例子中,它應該是這樣的:

SELECT t.country,t.landscape,t.household,case when duplicates.count>1 then 'duplicate'end 
FROM mytable t JOIN ( 
SELECT count(household) FROM mytable GROUP BY country,landscape 
) duplicates ON duplicates.country=t.country AND duplicates.landscape=t.landscape 

它產生完全相同的結果。

更新 - 如果要不惜一切代價來使用功能,這裏是工作示例:

CREATE FUNCTION find_duplicates(arg_country varchar, arg_landscape varchar) returns varchar AS $$ 
BEGIN 
    RETURN CASE WHEN count(household)>1 THEN 'duplicated' END FROM mytable 
    WHERE country=arg_country AND landscape=arg_landscape 
    GROUP BY country,landscape; 
END 
$$ 
LANGUAGE plpgsql STABLE; 
0
select 
    *, 
    (count(*) over (partition by country, landscape)) > 1 as flag 
from 
    mytable; 

對於功能看@MarcinH answer但增加stable到函數的定義,以使其調用更快。