我正在將數據從SQL Server遷移到Postgres。具有100k記錄的SQL Server表,2個內部聯接極其緩慢
我改變了我的表格結構來處理通用體育比賽,但它給我的表現問題。
我有以下表格:
- 比賽(ID,START_TIME)
- match_teams(ID,match_id,TEAM_ID,得分)
- match_players(ID,lineup_id, player_id),其中lineup_id是match_teams.id上的外鍵
我選擇所有比賽用下面的查詢:
SELECT * FROM matches AS m
INNER JOIN match_teams AS t ON m.id = t.match_id
INNER JOIN match_players AS p ON t.id = p.lineup_id
一個有100K的記錄,這個查詢大約需要6分鐘
-- Executing query:
SELECT * FROM matches AS m
INNER JOIN match_teams AS t ON m.id = t.match_id
INNER JOIN match_players AS p ON t.id = p.lineup_id
Total query runtime: 336360 ms.
1142078 rows retrieved.
在SQL Server中,我有所有這些數據在一個表它會在不到5秒內返回。在Postgres中,我還用jsonb將這些數據放到了1個表中,並且能夠在40秒內運行上述查詢。
如何使此查詢更快?我想把它降到秒。
在線閱讀我發現創建索引可以加速這些連接。我做了以下索引:
CREATE INDEX match_teams_match_id_idx ON match_teams USING btree (match_id);
CREATE INDEX match_players_lineup_id_idx ON match_players USING btree (lineup_id);
CREATE INDEX match_players_player_id_idx ON match_players USING btree (player_id);
CREATE INDEX matches_id_idx ON matches USING btree (id);
這些索引並沒有使查詢更快。我錯過了嗎?
這裏的EXPLAIN分析上述查詢的冗長輸出:
"Hash Join (cost=19314.10..67893.04 rows=1135917 width=24) (actual time=401.225..1624.906 rows=1142078 loops=1)"
" Output: m.id, m.start_time, t.team_id, t.rank, p.player_id"
" Hash Cond: (p.lineup_id = t.id)"
" -> Seq Scan on public.match_players p (cost=0.00..19818.78 rows=1142078 width=8) (actual time=0.039..356.168 rows=1142078 loops=1)"
" Output: p.player_id, p.lineup_id"
" -> Hash (cost=15119.58..15119.58 rows=228442 width=24) (actual time=401.123..401.123 rows=228442 loops=1)"
" Output: m.id, m.start_time, t.team_id, t.rank, t.id"
" Buckets: 8192 Batches: 4 Memory Usage: 3358kB"
" -> Hash Join (cost=5097.97..15119.58 rows=228442 width=24) (actual time=74.766..310.864 rows=228442 loops=1)"
" Output: m.id, m.start_time, t.team_id, t.rank, t.id"
" Hash Cond: (t.match_id = m.id)"
" -> Seq Scan on public.match_teams t (cost=0.00..3519.42 rows=228442 width=16) (actual time=0.004..64.580 rows=228442 loops=1)"
" Output: t.team_id, t.rank, t.match_id, t.id"
" -> Hash (cost=3112.21..3112.21 rows=114221 width=12) (actual time=74.728..74.728 rows=114221 loops=1)"
" Output: m.id, m.start_time"
" Buckets: 16384 Batches: 2 Memory Usage: 2682kB"
" -> Seq Scan on public.matches m (cost=0.00..3112.21 rows=114221 width=12) (actual time=0.003..34.789 rows=114221 loops=1)"
" Output: m.id, m.start_time"
"Planning time: 0.448 ms"
"Execution time: 1799.412 ms"
更新
新增DDL這裏:http://pastie.org/10529040
更新2個
Postgres的是上運行一個AWS RDS服務器。我試着在一臺乾淨的EC2服務器上運行上面的查詢並進行一次乾淨的PGAdmin安裝。我得到了相同的結果,似乎在〜2秒內運行查詢,但需要約6分鐘才能顯示數據。
更新3
我試圖運行從一個簡單的C#程序此查詢的結果,於10秒左右被退回。這似乎是PGAdmin的問題。
是否需要匹配(id)索引? (pk?) – jarlh
@jarlh matches(id)是主鍵。我創建了一個索引,似乎沒有做任何事情。我會更新索引列表。 – janderson
你是如何在MS SQL中的一張表中完成的?非正規化寬行? –