2015-12-22 46 views
0

我有以下兩個queries.Query 1是快速的,因爲它使用索引(使用嵌套循環連接)和查詢2使用散列連接,並且它比較慢。通過左連接排序不使用索引和非常慢

查詢1按表1列排序,查詢2按表2排列排序。

查詢1

learning=# explain analyze 
select * 
from users left join 
    access_logs 
    on users.userid = access_logs.userid 
order by users.userid 
limit 10 offset 90; 


                QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Limit (cost=14.00..15.46 rows=10 width=104) (actual time=1.330..1.504 rows=10 loops=1) 
    -> Merge Left Join (cost=0.85..291532.97 rows=1995958 width=104) (actual time=0.037..1.482 rows=100 loops=1) 
     Merge Cond: (users.userid = access_logs.userid) 
     -> Index Scan using users_pkey on users (cost=0.43..151132.75 rows=1995958 width=76) (actual time=0.018..1.135 rows=100 loops=1) 
     -> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.012..0.198 rows=100 loops=1) 
Planning time: 0.469 ms 
Execution time: 1.569 ms 

查詢2

learning=# explain analyze 
select * 
from users left join 
    access_logs 
    on users.userid = access_logs.userid 
order by access_logs.userid 
limit 10 offset 90; 
                    QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------ 
Limit (cost=293584.20..293584.23 rows=10 width=104) (actual time=3821.432..3821.439 rows=10 loops=1) 
    -> Sort (cost=293583.98..298573.87 rows=1995958 width=104) (actual time=3821.391..3821.415 rows=100 loops=1) 
     Sort Key: access_logs.userid 
     Sort Method: top-N heapsort Memory: 51kB 
     -> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=539.859..3168.754 rows=1995958 loops=1) 
       Hash Cond: (users.userid = access_logs.userid) 
       -> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.009..443.260 rows=1995958 loops=1) 
       -> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=539.112..539.112 rows=1995958 loops=1) 
        Buckets: 262144 Batches: 2 Memory Usage: 58532kB 
        -> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.006..170.061 rows=1995958 loops=1) 
Planning time: 0.480 ms 
Execution time: 3832.245 ms 

問題

  • 第二個查詢是慢,因爲分揀是不要e在計劃之前加入。
  • 爲什麼第二個表中的排序不使用索引?下面有這樣一個計劃。

查詢 - 解釋分析SELECT * FROM access_logs爲了通過用戶ID限制10偏移90;

計劃

Limit (cost=5.41..5.96 rows=10 width=28) (actual time=0.199..0.218 rows=10 loops=1) 
    -> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.029..0.201 rows=100 loops=1) 
Planning time: 0.120 ms 
Execution time: 0.252 ms 

編輯1

我的目標不是兩個查詢比較,其實我要的結果作爲查詢2,我只提供查詢1,使相比之下,我可以理解。

按順序不限於連接列,用戶也可以按表2中的另一列進行排序,計劃如下。

learning=# explain analyze select * from users left join access_logs on users.userid=access_logs.userid order by access_logs.last_login limit 10; 
                    QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------ 
Limit (cost=260431.83..260431.86 rows=10 width=104) (actual time=3846.625..3846.627 rows=10 loops=1) 
    -> Sort (cost=260431.83..265421.73 rows=1995958 width=104) (actual time=3846.623..3846.623 rows=10 loops=1) 
     Sort Key: access_logs.last_login 
     Sort Method: top-N heapsort Memory: 27kB 
     -> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=567.104..3174.818 rows=1995958 loops=1) 
       Hash Cond: (users.userid = access_logs.userid) 
       -> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.007..443.364 rows=1995958 loops=1) 
       -> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=566.814..566.814 rows=1995958 loops=1) 
        Buckets: 262144 Batches: 2 Memory Usage: 58532kB 
        -> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.004..169.137 rows=1995958 loops=1) 
Planning time: 0.490 ms 
Execution time: 3857.171 ms 
+2

這兩個查詢都返回完全不同的結果集。外部連接可能會在'access_logs.userid'中返回NULL,而您在此列中則返回ORDER。 – dnoeth

+0

dnoeth是對的。如果'access_log.userid'不包含'null'值,那麼'users.userid'的排序與'access_log.userid'的排序相同(因爲它們是連接列是相同的)。 –

+0

@a_horse_with_no_name:我上面做了一個編輯,很抱歉沒有先告訴它 –

回答

2

排序在第二個查詢不會使用索引,因爲索引不能保證所有的值都被排序。如果users中有一些記錄與access_logs不匹配,那麼Left Join會生成null在查詢中引用的值爲access_logs.userid,但實際上不存在於access_logs中,因此未被索引覆蓋。

解決方法是爲每個用戶在access_log中創建默認初始記錄,並使用Inner Join

+0

所以如果是這樣的話,那麼這個查詢就不可能有索引? –

+0

感謝您的更新..現在很清楚 –