2017-01-11 91 views
0

PostgreSQL的高效的查詢有與15M行表保存用戶的收件箱數據帶過濾器的過布爾

下面是簡而言之慢查詢:

SELECT * 
FROM dialogs 
WHERE user_id = 1234 
AND deleted_at IS NULL 
LIMIT 21 

全面查詢: (不相關的領域刪除)

SELECT "dialogs"."id", "dialogs"."subject", "dialogs"."product_id", "dialogs"."user_id", "dialogs"."participant_id", "dialogs"."thread_id", "dialogs"."last_message_id", "dialogs"."last_message_at", "dialogs"."read_at", "dialogs"."deleted_at", "products"."id", ... , T4."id", ... , "messages"."id", ..., 
FROM "dialogs" 
LEFT OUTER JOIN "products" ON ("dialogs"."product_id" = "products"."id") 
INNER JOIN "auth_user" T4 ON ("dialogs"."participant_id" = T4."id") 
LEFT OUTER JOIN "messages" ON ("dialogs"."last_message_id" = "messages"."id") 
WHERE ("dialogs"."deleted_at" IS NULL AND "dialogs"."user_id" = 9069) 
ORDER BY "dialogs"."last_message_id" DESC 
LIMIT 21; 

說明:

Limit (cost=1.85..28061.24 rows=21 width=1693) (actual time=4.700..93087.871 rows=17 loops=1) 
    -> Nested Loop Left Join (cost=1.85..9707215.30 rows=7265 width=1693) (actual time=4.699..93087.861 rows=17 loops=1) 
     -> Nested Loop (cost=1.41..9647421.07 rows=7265 width=1457) (actual time=4.689..93062.481 rows=17 loops=1) 
       -> Nested Loop Left Join (cost=0.99..9611285.66 rows=7265 width=1115) (actual time=4.676..93062.292 rows=17 loops=1) 
        -> Index Scan Backward using dialogs_last_message_id on dialogs (cost=0.56..9554417.92 rows=7265 width=102) (actual time=4.629..93062.050 rows=17 loops=1) 
          Filter: ((deleted_at IS NULL) AND (user_id = 9069)) 
          Rows Removed by Filter: 6852907 
        -> Index Scan using products_pkey on products (cost=0.43..7.82 rows=1 width=1013) (actual time=0.012..0.012 rows=1 loops=17) 
          Index Cond: (dialogs.product_id = id) 
       -> Index Scan using auth_user_pkey on auth_user t4 (cost=0.42..4.96 rows=1 width=342) (actual time=0.009..0.010 rows=1 loops=17) 
        Index Cond: (id = dialogs.participant_id) 
     -> Index Scan using messages_pkey on messages (cost=0.44..8.22 rows=1 width=236) (actual time=1.491..1.492 rows=1 loops=17) 
       Index Cond: (dialogs.last_message_id = id) 
Total runtime: 93091.494 ms 
(14 rows) 
  • OFFSET不使用
  • 有上user_id字段索引。
  • deleted_at上的索引因爲高選擇性而未使用(90%的值實際上爲NULL)。部分指數(... WHERE deleted_at IS NULL)也無濟於事。
  • 如果查詢遇到很久以前創建的結果的一部分,它會變得特別慢。然後,查詢必須篩選並放棄其間的數百萬行。

索引列表:?

Indexes: 
    "dialogs_pkey" PRIMARY KEY, btree (id) 
    "dialogs_deleted_at_d57b320e_uniq" btree (deleted_at) WHERE deleted_at IS NULL 
    "dialogs_last_message_id" btree (last_message_id) 
    "dialogs_participant_id" btree (participant_id) 
    "dialogs_product_id" btree (product_id) 
    "dialogs_thread_id" btree (thread_id) 
    "dialogs_user_id" btree (user_id) 

目前我正在考慮用適當的指數只查詢最近的數據(即... WHERE last_message_at > <date 3-6 month ago>(布林)

什麼是速度的最佳實踐up這樣的查詢?

+0

如果您運行的解釋僅使用'WHERE deleted_at IS NULL'查詢您是否看到預期的速度?如果是這樣,我建議在同一個索引中的'user_id'和'deleted_at'列上加上一個索引。通常這是必需的,因爲您無法按照您想象的方式合併兩個單獨的索引,但是將索引存儲在多個列中會產生更快的查詢時間。 –

+0

你說沒有使用deleted_at上的索引。但你的解釋顯示它是,沒有seq掃描。這是'dialogs_last_message_id'上的向後索引掃描。怎麼了?粘貼完整的查詢計劃。 –

+1

請發佈您的索引定義。你是什​​麼意思*部分索引不會幫助任何*? 'user_id'上的一個索引,其中'deleted_at IS NULL'應該有幫助。 – pozs

回答

1

,張貼在註釋:由符合條件WHERE deleted_at IS NULL

每答案建立在(user_id, last_message_id)部分索引

開始,這似乎是非常有效的:-)

0

所以,這裏是我試過的解決方案的結果

1)在極少數情況下使用索引(user_id) WHERE deleted_at IS NULL,具體取決於user_idWHERE user_id = ?條件下的某些值。大部分時間查詢必須按照前面的方式過濾掉行。

2)使用 (user_id, last_message_id) WHERE deleted_at IS NULL指數實現最大加速。雖然它比其他測試指標大2.5倍,但它始終都在使用,速度非常快。由此產生的查詢計劃

Limit (cost=1.72..270.45 rows=11 width=1308) (actual time=0.105..0.468 rows=8 loops=1) 
    -> Nested Loop Left Join (cost=1.72..247038.21 rows=10112 width=1308) (actual time=0.104..0.465 rows=8 loops=1) 
     -> Nested Loop (cost=1.29..164532.13 rows=10112 width=1072) (actual time=0.071..0.293 rows=8 loops=1) 
       -> Nested Loop Left Join (cost=0.86..116292.45 rows=10112 width=736) (actual time=0.057..0.198 rows=8 loops=1) 
        -> Index Scan Backward using dialogs_user_id_last_message_id_d57b320e on dialogs (cost=0.43..38842.21 rows=10112 width=102) (actual time=0.038..0.084 rows=8 loops=1) 
          Index Cond: (user_id = 9069) 
        -> Index Scan using products_pkey on products (cost=0.43..7.65 rows=1 width=634) (actual time=0.012..0.012 rows=1 loops=8) 
          Index Cond: (dialogs.product_id = id) 
       -> Index Scan using auth_user_pkey on auth_user t4 (cost=0.42..4.76 rows=1 width=336) (actual time=0.010..0.010 rows=1 loops=8) 
        Index Cond: (id = dialogs.participant_id) 
     -> Index Scan using messages_pkey on messages (cost=0.44..8.15 rows=1 width=236) (actual time=0.019..0.020 rows=1 loops=8) 
       Index Cond: (dialogs.last_message_id = id) 
Total runtime: 0.678 ms 

謝謝@jcaron。你的建議應該是一個被接受的答案。