DISTINCT與ORDER BY非常緩慢

因此，我第一次使用postgres，發現它運行速度很慢，並且通過查詢來運行不同的分組，現在我正在試圖找到最新的記錄以及它是否工作。這是第一個查詢我想出了：DISTINCT與ORDER BY非常緩慢

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
     FROM call_logs c 
     ORDER BY c.device_id, c.timestamp desc

和它的作品，但它是沿着時間運行服用。

Unique (cost=94840.24..97370.54 rows=11 width=17) (actual time=424.424..556.253 rows=13 loops=1) 
    -> Sort (cost=94840.24..96105.39 rows=506061 width=17) (actual time=424.423..531.905 rows=506061 loops=1) 
    Sort Key: device_id, "timestamp" DESC 
    Sort Method: external merge Disk: 13272kB 
    -> Seq Scan on call_logs c (cost=0.00..36512.61 rows=506061 width=17) (actual time=0.059..162.932 rows=506061 loops=1) 
Planning time: 0.152 ms 
Execution time: 557.957 ms 
(7 rows)

我已經更新了查詢中使用速度更快，但非常難看如下：

SELECT c.device_id, c.timestamp, c.working FROM call_logs c 
    INNER JOIN (SELECT c.device_id, MAX(c.timestamp) AS timestamp 
               FROM call_logs c 
               GROUP BY c.device_id) 
               newest on newest.timestamp = c.timestamp

和分析：

Nested Loop (cost=39043.34..39136.08 rows=12 width=17) (actual time=216.406..216.580 rows=15 loops=1) 
    -> HashAggregate (cost=39042.91..39043.02 rows=11 width=16) (actual time=216.347..216.351 rows=13 loops=1) 
    Group Key: c_1.device_id 
    -> Seq Scan on call_logs c_1 (cost=0.00..36512.61 rows=506061 width=16) (actual time=0.026..125.482 rows=506061 loops=1) 
    -> Index Scan using call_logs_timestamp on call_logs c (cost=0.42..8.44 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=13) 
    Index Cond: ("timestamp" = (max(c_1."timestamp"))) 
Planning time: 0.318 ms 
Execution time: 216.631 ms 
(8 rows)

即使是200ms的似乎有點慢我因爲我想要的是每臺設備的最高記錄（這是在索引表中）

這是我的索引它使用：

CREATE INDEX call_logs_timestamp 
ON public.call_logs USING btree 
(timestamp) 
TABLESPACE pg_default;

我曾嘗試下面的指數，但不會在所有幫助：

CREATE INDEX dev_ts_1 
ON public.call_logs USING btree 
(device_id, timestamp DESC, working) 
TABLESPACE pg_default;

任何想法，我失去了一些東西明顯？

來源

2017-05-12 user1434177

200毫秒真的沒有那麼糟糕，通過500K行。但對於此查詢：

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working 
FROM call_logs c 
ORDER BY c.device_id, c.timestamp desc

那麼你的索引call_logs(device_id, timestamp desc, working)應該是一個最佳索引。其他

兩種方式編寫查詢的同一指標爲：

select c.* 
from (select c.device_id, c.timestamp, c.working, c.*, 
      row_number() over (partition by device_id order by timestamp desc) as seqnum 
     from call_logs c 
    ) c 
where seqnum = 1;

和：

select c.device_id, c.timestamp, c.working 
from call_logs c 
where not exists (select 1 
        from call_logs c2 
        where c2.device_id = c.device_id and 
         c2.timestamp > c.timestamp 
       );

來源

2017-05-12 01:53:08

未使用的索引。但我不確定你的意思是一個最佳指數？ – user1434177

@ user1434177。。。最佳意味着這是查詢的最佳索引。表中的統計數據可能不正確。 –

謝謝我使用了VACUUM ANALYZE;現在需要74ms才能運行。 – user1434177

DISTINCT與ORDER BY非常緩慢

回答

相關問題