通過左連接排序不使用索引和非常慢

我有以下兩個queries.Query 1是快速的，因爲它使用索引（使用嵌套循環連接）和查詢2使用散列連接，並且它比較慢。通過左連接排序不使用索引和非常慢

查詢1按表1列排序，查詢2按表2排列排序。

查詢1

learning=# explain analyze 
select * 
from users left join 
    access_logs 
    on users.userid = access_logs.userid 
order by users.userid 
limit 10 offset 90; 


                QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Limit (cost=14.00..15.46 rows=10 width=104) (actual time=1.330..1.504 rows=10 loops=1) 
    -> Merge Left Join (cost=0.85..291532.97 rows=1995958 width=104) (actual time=0.037..1.482 rows=100 loops=1) 
     Merge Cond: (users.userid = access_logs.userid) 
     -> Index Scan using users_pkey on users (cost=0.43..151132.75 rows=1995958 width=76) (actual time=0.018..1.135 rows=100 loops=1) 
     -> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.012..0.198 rows=100 loops=1) 
Planning time: 0.469 ms 
Execution time: 1.569 ms

查詢2

learning=# explain analyze 
select * 
from users left join 
    access_logs 
    on users.userid = access_logs.userid 
order by access_logs.userid 
limit 10 offset 90; 
                    QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------ 
Limit (cost=293584.20..293584.23 rows=10 width=104) (actual time=3821.432..3821.439 rows=10 loops=1) 
    -> Sort (cost=293583.98..298573.87 rows=1995958 width=104) (actual time=3821.391..3821.415 rows=100 loops=1) 
     Sort Key: access_logs.userid 
     Sort Method: top-N heapsort Memory: 51kB 
     -> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=539.859..3168.754 rows=1995958 loops=1) 
       Hash Cond: (users.userid = access_logs.userid) 
       -> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.009..443.260 rows=1995958 loops=1) 
       -> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=539.112..539.112 rows=1995958 loops=1) 
        Buckets: 262144 Batches: 2 Memory Usage: 58532kB 
        -> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.006..170.061 rows=1995958 loops=1) 
Planning time: 0.480 ms 
Execution time: 3832.245 ms

問題

第二個查詢是慢，因爲分揀是不要e在計劃之前加入。
爲什麼第二個表中的排序不使用索引？下面有這樣一個計劃。

查詢 - 解釋分析SELECT * FROM access_logs爲了通過用戶ID限制10偏移90;

計劃

Limit (cost=5.41..5.96 rows=10 width=28) (actual time=0.199..0.218 rows=10 loops=1) 
    -> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.029..0.201 rows=100 loops=1) 
Planning time: 0.120 ms 
Execution time: 0.252 ms

編輯1：

我的目標不是兩個查詢比較，其實我要的結果作爲查詢2，我只提供查詢1，使相比之下，我可以理解。

按順序不限於連接列，用戶也可以按表2中的另一列進行排序，計劃如下。

learning=# explain analyze select * from users left join access_logs on users.userid=access_logs.userid order by access_logs.last_login limit 10; 
                    QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------ 
Limit (cost=260431.83..260431.86 rows=10 width=104) (actual time=3846.625..3846.627 rows=10 loops=1) 
    -> Sort (cost=260431.83..265421.73 rows=1995958 width=104) (actual time=3846.623..3846.623 rows=10 loops=1) 
     Sort Key: access_logs.last_login 
     Sort Method: top-N heapsort Memory: 27kB 
     -> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=567.104..3174.818 rows=1995958 loops=1) 
       Hash Cond: (users.userid = access_logs.userid) 
       -> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.007..443.364 rows=1995958 loops=1) 
       -> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=566.814..566.814 rows=1995958 loops=1) 
        Buckets: 262144 Batches: 2 Memory Usage: 58532kB 
        -> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.004..169.137 rows=1995958 loops=1) 
Planning time: 0.490 ms 
Execution time: 3857.171 ms

來源

2015-12-22 Greedy Coder

這兩個查詢都返回完全不同的結果集。外部連接可能會在'access_logs.userid'中返回NULL，而您在此列中則返回ORDER。 – dnoeth

dnoeth是對的。如果'access_log.userid'不包含'null'值，那麼'users.userid'的排序與'access_log.userid'的排序相同（因爲它們是連接列是相同的）。 –

@a_horse_with_no_name：我上面做了一個編輯，很抱歉沒有先告訴它 –

排序在第二個查詢不會使用索引，因爲索引不能保證所有的值都被排序。如果users中有一些記錄與access_logs不匹配，那麼Left Join會生成null在查詢中引用的值爲access_logs.userid，但實際上不存在於access_logs中，因此未被索引覆蓋。

解決方法是爲每個用戶在access_log中創建默認初始記錄，並使用Inner Join。

來源

2015-12-22 11:08:32

所以如果是這樣的話，那麼這個查詢就不可能有索引？ –

感謝您的更新..現在很清楚 –

通過左連接排序不使用索引和非常慢

回答

相關問題