子查詢的性能我有我的數據庫中這兩個表提高Postgres裏
Student Table Student Semester Table
| Column : Type | | Column : Type |
|------------|----------| |------------|----------|
| student_id : integer | | student_id : integer |
| satquan : smallint | | semester : integer |
| actcomp : smallint | | enrolled : boolean |
| entryyear : smallint | | major : text |
|-----------------------| | college : text |
|-----------------------|
凡student_id數據是在學生表中的唯一密鑰,並在學生學期表的外鍵。學期整數只是第一學期,2表示第二個1,以此類推。
我做的,我想通過自己的entryyear讓學生查詢(有時通過他們的SAT和/或ACT成績),然後讓所有從學生學期表相關數據的那些學生。
目前,我的疑問是這個樣子:
SELECT * FROM student_semester
WHERE student_id IN(
SELECT student_id FROM student_semester
WHERE student_id IN(
SELECT student_id FROM student WHERE entryyear = 2006
) AND college = 'AS' AND ...
)
ORDER BY student_id, semester;
但是,這導致相當長時間運行的查詢(400毫秒),當我選擇1k的學生。根據執行計劃,大部分時間都花在做散列連接上。爲了改善這一點,我已經添加satquan,actpcomp和entryyear列於表student_semester。這減少了運行查詢的時間約90%,但會導致大量冗餘數據。有一個更好的方法嗎?
這些是我目前擁有的索引(連同上student_id數據隱含的指標):
CREATE INDEX act_sat_entryyear ON student USING btree (entryyear, actcomp, sattotal)
CREATE INDEX student_id_major_college ON student_semester USING btree (student_id, major, college)
查詢計劃
QUERY PLAN
Hash Join (cost=17311.74..35895.38 rows=81896 width=65) (actual time=121.097..326.934 rows=25680 loops=1)
Hash Cond: (public.student_semester.student_id = public.student_semester.student_id)
-> Seq Scan on student_semester (cost=0.00..14307.20 rows=698820 width=65) (actual time=0.015..154.582 rows=698820 loops=1)
-> Hash (cost=17284.89..17284.89 rows=2148 width=8) (actual time=121.062..121.062 rows=1284 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> HashAggregate (cost=17263.41..17284.89 rows=2148 width=8) (actual time=120.708..120.871 rows=1284 loops=1)
-> Hash Semi Join (cost=1026.68..17254.10 rows=3724 width=8) (actual time=4.828..119.619 rows=6184 loops=1)
Hash Cond: (public.student_semester.student_id = student.student_id)
-> Seq Scan on student_semester (cost=0.00..16054.25 rows=42908 width=4) (actual time=0.013..109.873 rows=42331 loops=1)
Filter: ((college)::text = 'AS'::text)
-> Hash (cost=988.73..988.73 rows=3036 width=4) (actual time=4.801..4.801 rows=3026 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 107kB
-> Bitmap Heap Scan on student (cost=71.78..988.73 rows=3036 width=4) (actual time=0.406..3.223 rows=3026 loops=1)
Recheck Cond: (entryyear = 2006)
-> Bitmap Index Scan on student_act_sat_entryyear_index (cost=0.00..71.03 rows=3036 width=0) (actual time=0.377..0.377 rows=3026 loops=1)
Index Cond: (entryyear = 2006)
Total runtime: 327.708 ms
我弄錯了那裏不是一個序列掃描在查詢中。由於與大學條件相匹配的行數,我認爲Seq掃描正在完成;當我將其改爲使用較少學生的指數時。來源:https://stackoverflow.com/a/5203827/880928
查詢與entryyear列包括學生學期表
SELECT * FROM student_semester
WHERE student_id IN(
SELECT student_id FROM student_semester
WHERE entryyear = 2006 AND collgs = 'AS'
) ORDER BY student_id, semester;
查詢計劃
Sort (cost=18597.13..18800.49 rows=81343 width=65) (actual time=72.946..74.003 rows=25680 loops=1)
Sort Key: public.student_semester.student_id, public.student_semester.semester
Sort Method: quicksort Memory: 3546kB
-> Nested Loop (cost=9843.87..11962.91 rows=81343 width=65) (actual time=24.617..40.751 rows=25680 loops=1)
-> HashAggregate (cost=9843.87..9845.73 rows=186 width=4) (actual time=24.590..24.836 rows=1284 loops=1)
-> Bitmap Heap Scan on student_semester (cost=1612.75..9834.63 rows=3696 width=4) (actual time=10.401..23.637 rows=6184 loops=1)
Recheck Cond: (entryyear = 2006)
Filter: ((collgs)::text = 'AS'::text)
-> Bitmap Index Scan on entryyear_act_sat_semester_enrolled_cumdeg_index (cost=0.00..1611.82 rows=60192 width=0) (actual time=10.259..10.259 rows=60520 loops=1)
Index Cond: (entryyear = 2006)
-> Index Scan using student_id_index on student_semester (cost=0.00..11.13 rows=20 width=65) (actual time=0.003..0.010 rows=20 loops=1284)
Index Cond: (student_id = public.student_semester.student_id)
Total runtime: 74.938 ms
請使用'explain analyze'和表中定義的任何索引來發布執行計劃。更多關於在這裏發佈這樣的問題:https://wiki.postgresql.org/wiki/Slow_Query_Questions – 2013-05-08 16:48:47
當要求性能優化時,您還必須提供您的Postgres版本。應該不用說。閱讀[標籤信息postgresql性能](http://stackoverflow.com/tags/postgresql-performance/info) – 2013-05-08 16:50:55
@ErwinBrandstetter我沒有發佈Postgres的版本,因爲我認爲這是更多的通用數據庫模式/查詢策略問題,但我將添加版本以及查詢計劃。 – cmorse 2013-05-08 17:05:01