2016-06-08 57 views
0

我有一個現有的table1,其中包含「account」,「tax_year」和其他字段。當CONCAT(account,tax_year)的頻率爲1並符合WHERE子句時,我想創建一個table2與來自table1的記錄。如何在PostgresSQL中使用Count作爲標準

例如,如果table1的樣子如下:

account year 
aaa 2014 
bbb 2016 
bbb 2016 
ddd 2014 
ddd 2014 
ddd 2015 

表2應該是:

account year 
aaa 2014 
ddd 2015 

這裏是我的腳本:

DROP TABLE IF EXISTS table1; 
CREATE table2 AS 
SELECT 
    account::text, 
    tax_year::text, 
    building_number, 
    imprv_type, 
    building_style_code, 
    quality, 
    quality_description, 
    date_erected, 
    yr_remodel, 
    actual_area, 
    heat_area, 
    gross_area, 
    CONCAT(account, tax_year) AS unq 
FROM table1 
WHERE imprv_type=1001 and date_erected>0 and date_erected IS NOT NULL and quality IS NOT NULL and quality_description IS NOT NULL and yr_remodel>0 and yr_remodel IS NOT NULL and heat_area>0 and heat_area IS NOT NULL 
GROUP BY account, 
    tax_year, 
    building_number, 
    imprv_type, 
    building_style_code, 
    quality, 
    quality_description, 
    date_erected, 
    yr_remodel, 
    actual_area, 
    heat_area, 
    gross_area, 
    unq 
HAVING COUNT(unq)=1; 

我花了兩天但它仍然無法弄清楚如何做對。謝謝您的幫助!

回答

0

使用對(account, tax_year)的計數table1的正確方法:

select account, tax_year 
from table1 
where imprv_type=1001 -- and many more... 
group by account, tax_year 
having count(*) = 1; 

所以你應該嘗試:

create table table2 as 
select * 
from table1 
where (account, tax_year) in (
    select account, tax_year 
    from table1 
    where imprv_type=1001 -- and many more... 
    group by account, tax_year 
    having count(*) = 1 
    ); 
+0

謝謝!我的源表中有11,755,200行和71行。該查詢已運行了20個小時,仍在運行。花費這麼長時間來分析這個數據集的大小是否很常見?我是Postgres的新手 – 12B01

+0

這個查詢的確很昂貴。服務器很可能會耗盡內存,導致內存交換。隨着表格的大小,應該使用特殊的方法,例如。通過使用where子句將數據劃分爲更小的邏輯部分來分階段執行。 – klin

0

COUNT() = 1相當於NOT EXISTS(another with the same key fields)

SELECT 
    account, tax_year 
    -- ... maybe more fields ... 
FROM table1 t1 
WHERE NOT EXISTS (SELECT * 
    FROM table1 nx 
    WHERE nx.account = t1.account -- same key field(s) 
    AND nx.tax_year = t1.tax_year 
    AND nx.ctid <> t1.ctid   -- but a different row! 
    ); 

注:I由複合匹配鍵取代了COUNT(CONCAT(account, tax_year)級聯密鑰字段。

+0

謝謝你的快速回復!我認爲您的查詢將返回所有唯一記錄,不僅僅是頻率記錄(「帳戶」和「稅收年」)= 1。以我的問題中的table1爲例,NOT EXISTS將返回aaa 2014,bbb 2016,ddd 2014,ddd 2015.但我真正需要的只是aaa 2014和ddd2015 – 12B01

+0

您可以將額外條件添加到where子句注意:你不需要**需要GROUP BY,因爲你不用這個方法使用聚合函數) – wildplasser

相關問題