2016-07-05 53 views
0

我的查詢返回數據不爲空的每個字段的音量。加速計數不同

SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3 
FROM (
    SELECT field1, field2, field3 
    FROM table1, table2 
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC 
    LIMIT 10000 
) AS rq 

table1.id是table1的主鍵,table2.idt1是table2的輔助鍵。 此查詢工作得很好,但如果我需要返回每個字段的不同的卷,這樣

SELECT COUNT(DISTINCT(field1)) AS field1, COUNT(DISTINCT(field2)) AS field2, COUNT(DISTINCT(field3)) AS field3 
FROM (
    SELECT field1, field2, field3 
    FROM table1, table2 
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC 
    LIMIT 10000 
) AS rq 

問題開始...查詢正在做的工作,但表演的當然比沒有DISTINCT子句要慢很多。

Table 1和表2是指標與B樹

CREATE INDEX field1_index ON table1 USING btree (field1) 
CREATE INDEX field2_index ON table1 USING btree (field2) 
CREATE INDEX field3_index ON table2 USING btree (field3) 

我如何可以加快這種重複計數的每個字段?也許有更好的索引?

感謝您的幫助

+2

你應該學會使用正確的,明確的'JOIN'語法。 –

+0

'DISTINCT'是***不是***功能 –

+0

做一個連接,然後計算剛剛被那個非常連接重複的行中的不同值的點是什麼?如果不加入就算不算更好? – Tomalak

回答

0

Postgres沒有優化COUNT(DISTINCT)很好。你有多個這樣的表達式,這使得它更難一點。我打算利用窗口函數和條件聚集建議:

SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1, 
     SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2, 
     SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3 
FROM (SELECT field1, field2, field3, 
      ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1, 
      ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2, 
      ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3 
     FROM table1 JOIN 
      table2 
      ON table1.id=table2.idt1 
     ORDER BY table1.id ASC 
     LIMIT 10000 
    ) rq 

編輯:

認爲row_number()可能的limit前處理發生在我身上。試試這個版本:

SELECT SUM(CASE WHEN seqnum_1 = 1 THEN 1 ELSE 0 END) as field1, 
     SUM(CASE WHEN seqnum_2 = 1 THEN 1 ELSE 0 END) as field2, 
     SUM(CASE WHEN seqnum_3 = 1 THEN 1 ELSE 0 END) as field3 
FROM (SELECT field1, field2, field3, 
      ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY field1) as seqnum_1, 
      ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field2) as seqnum_2, 
      ROW_NUMBER() OVER (PARTITION BY field3 ORDER BY field3) as seqnum_3 
     FROM (SELECT field1, field2, field3 
      FROM table1 JOIN 
       table2 
       ON table1.id = table2.idt1 
      ORDER BY table1.id ASC 
      LIMIT 10000 
      ) t 
    ) rq 
+0

謝謝@戈登,但在我的情況下,它似乎太慢,更糟糕。它從幾分鐘開始運行,尚未完成。用DISTINCT子句5分鐘。 – Macbernie

+0

事實上,您的編輯方法運行良好,但需要818次調用,針對基本DISTINCT子句 – Macbernie

+0

@Macbernie的437次調用。 。 。這些領域的類型是什麼?一萬個行上的「count(distinct)」應該不會太長。 –

0

我試過類似的東西在一張大桌子裏。 (12百萬行)

沒有DISTINCT需要10秒鐘。

隨着DISTINCT喜歡你的代碼它需要19秒。

Puting子查詢裏面的DISTINCT需要11秒

SELECT COUNT(field1) AS field1, COUNT(field2) AS field2, COUNT(field3) AS field3 
FROM (
    SELECT DISTINCT(field1) AS field1, DISTINCT(field2) AS field2, DISTINCT(field3) AS field3 
    FROM table1, table2 
    WHERE table1.id=table2.idt1 
    ORDER BY table1.id ASC 
    LIMIT 10000 
) AS rq 

其他的事情,如果你只想要過濾NULL數據,可以作出這樣的where子句而不是使用不同英寸