2015-01-05 59 views
1

我有一個表現不佳的SQL查詢。我已經對連接進行了一些研究,觀看了教程,確保了我定義了正確的索引等,但老實說,我已經對如何提高這種所謂的查詢的性能感到有點遺憾。優化SQL JOIN調用

我有以下模式定義:

create_table "training_plans", :force => true do |t| 
    t.integer "user_id" 
end 

add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id" 

create_table "training_weeks", :force => true do |t| 
    t.integer "training_plan_id" 
    t.date  "start_date" 
end 

add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date" 
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id" 

create_table "training_efforts", :force => true do |t| 
    t.string "name" 
    t.date  "plandate" 
    t.integer "training_week_id" 
end 

add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate" 
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate" 
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id" 

然後將下面的號召收集所有與特定training_plan相關的training_efforts,包括所有相關的乘坐對象,其中training_effort plandates不到的目標日期範圍,排序結果。

tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?", 
                 beginning_date, 
                 end_date) 

這將產生以下查詢輸出:

TrainingEffort Load (3393.6ms) SELECT "training_efforts".* FROM "training_efforts" 
    INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id" 
    WHERE "training_weeks"."training_plan_id" = 104 
    AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC 

我相信,我已經定義了正確的索引。桌子並不大。然而,這需要花費大量的時間。作爲進一步的背景,這是在Heroku Postgres上。最後,我要提的是在我的開發系統,查詢比大多數(3.3ms),速度較慢,但​​仍然不近1000倍的任何地方比一般的慢...

預先感謝優化此查詢任何幫助。

UPDATE 下面是用於查詢的EXPLAIN輸出(我開發的系統上發佈):

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
    ON "training_efforts"."training_week_id" = "training_weeks"."id" 
    WHERE "training_weeks"."training_plan_id" = 7 
    AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC; 
              QUERY PLAN           
----------------------------------------------------------------------------------------------- 
Sort (cost=430.52..432.04 rows=606 width=120) 
    Sort Key: training_efforts.plandate 
    -> Hash Join (cost=15.12..402.51 rows=606 width=120) 
     Hash Cond: (training_efforts.training_week_id = training_weeks.id) 
     -> Seq Scan on training_efforts (cost=0.00..377.25 rows=1089 width=120) 
       Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date)) 
     -> Hash (cost=11.86..11.86 rows=261 width=4) 
       -> Seq Scan on training_weeks (cost=0.00..11.86 rows=261 width=4) 
        Filter: (training_plan_id = 7) 

更新2 嘗試不同的查詢,看看我的索引將被使用並注意與training_weeks相比(訓練週期數都是日期欄),有7倍的training_efforts,我會嘗試搜索training_week日期而不是training_effort日期,如下所示:

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
    ON "training_weeks"."id" = "training_efforts"."training_week_id" 
    WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks" 
    WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03')) 
    ORDER BY plandate ASC; 
                    QUERY PLAN                  
---------------------------------------------------------------------------------------------------------------------------------------------------- 
Sort (cost=376.83..378.34 rows=602 width=120) 
    Sort Key: training_efforts.plandate 
    -> Nested Loop (cost=14.23..349.04 rows=602 width=120) 
     -> Hash Semi Join (cost=13.95..26.83 rows=86 width=8) 
       Hash Cond: (training_weeks.id = training_weeks_1.id) 
       -> Seq Scan on training_weeks (cost=0.00..10.69 rows=469 width=4) 
       -> Hash (cost=12.87..12.87 rows=86 width=4) 
        -> Bitmap Heap Scan on training_weeks training_weeks_1 (cost=5.37..12.87 rows=86 width=4) 
          Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date)) 
          -> Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date (cost=0.00..5.35 rows=86 width=0) 
           Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date)) 
     -> Index Scan using index_training_efforts_on_training_week_id on training_efforts (cost=0.28..3.68 rows=7 width=120) 
       Index Cond: (training_week_id = training_weeks.id) 

這似乎稍好一些,但我仍然沒有把握確信這是最優化的...

回答

0

每個表中有多少行?你最近是否重新創建了這些表格還是舊的?你最近分析過這些表嗎?它看起來像在執行seq_scans並且沒有使用任何索引。

我對你的整個數據庫發出

vacuum analyze 

,或者至少這兩個表。很多時候,如果優化器在表上沒有正確的統計信息,它將跳過索引。

+0

我同意你的指標......爲什麼會這樣呢?我會嘗試另一種查詢格式,看看它是否使用索引....這三個表有成千上萬的行(5-30k)。他們已經有好幾個月了。剛剛分析報告說,他們在過去兩天被自動清洗。 –

+0

分析後執行計劃(或速度)是否改變?進行真空分析非常重要,因爲優化器根據有關數據的統計信息進行優化。如果它認爲你的數據非常小,或者你會查詢大部分數據,它將完全忽略索引,因爲在這些情況下它們可能效率低下。 –

+0

感謝joe和@khampson。我將此標記爲答案,因爲它最接近於解決問題。我需要等待幾天才能看到日誌,並對結果感到滿意。基本上,我將查詢改爲'tefts = TrainingEffort.includes(:rides).order(「plandate ASC」)。joins(:training_week).where(:training_weeks => {:id => self.training_weeks.where(「 start_date> =?AND start_date <=?「,beginning_date,end_date)})'。然後,在抽真空DB之後,我將Heroku上的數據庫從愛好升級爲標準。這個組合做了訣竅。 –

0

它看起來像你實際上並沒有使用從JOIN輸出,所以我會建議完全拋棄它,看是否能夠提高性能。

我會建議使用原始查詢(你應該能夠調用的ActiveRecord對象的connection.execute方法與SQL和參數,對於需要由SQL庫進行插值參數代?(即這是可變的),然後將這些參數作爲第二個參數傳遞給方法)。

對於原始的SQL,我建議嘗試類似下面的內容(根據需要替換任何參數將會改變的佔位符和參數)。我懷疑這會表現得好多了。

SELECT te.* 
FROM training_efforts AS te 
WHERE EXISTS (SELECT 1 
       FROM training_weeks AS tw 
       WHERE tw.training_week_id = te.training_week_id 
       AND tw.training_plan_id = 7 
       AND start_date >= '2015-01-05' AND start_date <= '2016-01-03' 
      ) 
ORDER BY plandate ASC 

在車削是爲的ActiveRecord查詢而言,我不知道,它提供了控制的相當這個水平 - 這可能是最好保留爲原始查詢