2017-05-12 147 views
1

因此,我第一次使用postgres,發現它運行速度很慢,並且通過查詢來運行不同的分組,現在我正在試圖找到最新的記錄以及它是否工作。 這是第一個查詢我想出了:DISTINCT與ORDER BY非常緩慢

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
     FROM call_logs c 
     ORDER BY c.device_id, c.timestamp desc 

和它的作品,但它是沿着時間運行服用。

Unique (cost=94840.24..97370.54 rows=11 width=17) (actual time=424.424..556.253 rows=13 loops=1) 
    -> Sort (cost=94840.24..96105.39 rows=506061 width=17) (actual time=424.423..531.905 rows=506061 loops=1) 
    Sort Key: device_id, "timestamp" DESC 
    Sort Method: external merge Disk: 13272kB 
    -> Seq Scan on call_logs c (cost=0.00..36512.61 rows=506061 width=17) (actual time=0.059..162.932 rows=506061 loops=1) 
Planning time: 0.152 ms 
Execution time: 557.957 ms 
(7 rows) 

我已經更新了查詢中使用速度更快,但非常難看如下:

SELECT c.device_id, c.timestamp, c.working FROM call_logs c 
    INNER JOIN (SELECT c.device_id, MAX(c.timestamp) AS timestamp 
               FROM call_logs c 
               GROUP BY c.device_id) 
               newest on newest.timestamp = c.timestamp 

和分析:

Nested Loop (cost=39043.34..39136.08 rows=12 width=17) (actual time=216.406..216.580 rows=15 loops=1) 
    -> HashAggregate (cost=39042.91..39043.02 rows=11 width=16) (actual time=216.347..216.351 rows=13 loops=1) 
    Group Key: c_1.device_id 
    -> Seq Scan on call_logs c_1 (cost=0.00..36512.61 rows=506061 width=16) (actual time=0.026..125.482 rows=506061 loops=1) 
    -> Index Scan using call_logs_timestamp on call_logs c (cost=0.42..8.44 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=13) 
    Index Cond: ("timestamp" = (max(c_1."timestamp"))) 
Planning time: 0.318 ms 
Execution time: 216.631 ms 
(8 rows) 

即使是200ms的似乎有點慢我因爲我想要的是每臺設備的最高記錄(這是在索引表中)

這是我的索引它使用:

CREATE INDEX call_logs_timestamp 
ON public.call_logs USING btree 
(timestamp) 
TABLESPACE pg_default; 

我曾嘗試下面的指數,但不會在所有幫助:

CREATE INDEX dev_ts_1 
ON public.call_logs USING btree 
(device_id, timestamp DESC, working) 
TABLESPACE pg_default; 

任何想法,我失去了一些東西明顯?

回答

1

200毫秒真的沒有那麼糟糕,通過500K行。但對於此查詢:

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
FROM call_logs c 
ORDER BY c.device_id, c.timestamp desc 

那麼你的索引call_logs(device_id, timestamp desc, working)應該是一個最佳索引。其他

兩種方式編寫查詢的同一指標爲:

select c.* 
from (select c.device_id, c.timestamp, c.working, c.*, 
      row_number() over (partition by device_id order by timestamp desc) as seqnum 
     from call_logs c 
    ) c 
where seqnum = 1; 

和:

select c.device_id, c.timestamp, c.working 
from call_logs c 
where not exists (select 1 
        from call_logs c2 
        where c2.device_id = c.device_id and 
         c2.timestamp > c.timestamp 
       ); 
+0

未使用的索引。但我不確定你的意思是一個最佳指數? – user1434177

+0

@ user1434177。 。 。最佳意味着這是查詢的最佳索引。表中的統計數據可能不正確。 –

+0

謝謝我使用了VACUUM ANALYZE;現在需要74ms才能運行。 – user1434177