我們有一個簡單的通用表格結構,在PostgreSQL中實現(8.3; 9.1在我們的視野)。這似乎是一個非常直接和普遍的實現。它歸結爲:查詢記錄通過鍵值對鏈接到實際上與條件匹配的記錄
events_event_types
(
# this table holds some 50 rows
id bigserial # PK
"name" character varying(255)
)
events_events
(
# this table holds some 15M rows
id bigserial # PK
datetime timestamp with time zone
eventtype_id bigint # FK to events_event_types.id
)
CREATE TABLE events_eventdetails
(
# this table holds some 65M rows
id bigserial # PK
keyname character varying(255)
"value" text
event_id bigint # FK to events_events.id
)
一些events_events行和events_eventdetails表會是這樣的:
events_events | events_eventdetails
id datetime eventtype_id | id keyname value event_id
----------------------------|-------------------------------------------
100 ... 10 | 1000 transactionId 9774ae16-... 100
| 1001 someKey some value 100
200 ... 20 | 2000 transactionId 9774ae16-... 200
| 2001 reductionId 123 200
| 2002 reductionId 456 200
300 ... 30 | 3000 transactionId 9774ae16-... 300
| 2001 customerId 234 300
| 2001 companyId 345 300
我們正處在一個「解決方案」,它返回events_events行100和迫切需要200和300一起在一個結果集和FAST!當詢問reductionId = 123時,或者當詢問customerId = 234或詢問companyId = 345時。 (可能對這些標準的AND組合有興趣,但這不是目標。) 不確定此時是否重要,但結果集應該可以在日期時間範圍和eventtype_id(IN列表)上過濾並獲得LIMIT 。
我問了一個「解決方案」,因爲這可能是兩種:
- 單個查詢
- 兩個較小的查詢(只要它們的中間結果總是足夠小,我採取了這一做法,並被困的公司(companyId)與大量關聯交易(〜20K)(的transactionId))
- 一個微妙的重新設計(如非規範化)
這不是一個新鮮的疑問句因爲我們在幾個月內嘗試了所有三種方法(不會因爲這些查詢而煩惱你),但它在表現上都失敗了。該解決方案應返回< < < 1秒。先前的嘗試花費了大約。最好是10秒。
我真的很感激一些幫助 - 我在現在的損失......
兩個較小的查詢方法看起來就像這樣:
查詢1:
SELECT Substring(details2_transvalue.VALUE, 0, 32)
FROM events_eventdetails details2_transvalue
JOIN events_eventdetails compdetails ON details2_transvalue.event_id = compdetails.event_id
AND compdetails.keyname = 'companyId'
AND Substring(compdetails.VALUE, 0, 32) = '4'
AND details2_transvalue.keyname = 'transactionId'
問題2:
SELECT events1.*
FROM events_events events1
JOIN events_eventdetails compDetails ON events1.id = compDetails.event_id
AND compDetails.keyname='companyId'
AND substring(compDetails.value,0,32)='4'
WHERE events1.eventtype_id IN (...)
UNION
SELECT events2.*
FROM events_events events2
JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id
AND details2_transKey.keyname='transactionId'
AND substring(details2_transKey.value,0,32) IN (-- result of query 1 goes here --)
WHERE events2.eventtype_id IN (...)
ORDER BY dateTime DESC LIMIT 50
由於查詢1返回大集,因此性能變差。
正如您所看到的,events_eventdetails表中的值始終表示爲長度爲32的子字符串,我們已將它們編入索引。 keyname,event_id,event_id + keyname,keyname + length 32 substring的更多索引。
這是一個PostgreSQL 9。1種方法 - 儘管我沒有正式的該平臺在我手上:
WITH companyevents AS (
SELECT events1.*
FROM events_events events1
JOIN events_eventdetails compDetails
ON events1.id = compDetails.event_id
AND compDetails.keyname='companyId'
AND substring(compDetails.value,0,32)=' -- my desired companyId -- '
WHERE events1.eventtype_id in (...)
ORDER BY dateTime DESC
LIMIT 50
)
SELECT * from events_events
WHERE transaction_id IN (SELECT transaction_id FROM companyevents)
OR id IN (SELECT id FROM companyevents)
AND eventtype_id IN (...)
ORDER BY dateTime DESC
LIMIT 250;
的查詢計劃是與28228個transactionIds爲companyId如下:
Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=210.100..3026.267 rows=50 loops=1)
CTE companyevents
-> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=206.994..207.020 rows=50 loops=1)
-> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=206.993..207.005 rows=50 loops=1)
Sort Key: events1.datetime
Sort Method: top-N heapsort Memory: 23kB
-> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.093..178.719 rows=28228 loops=1)
-> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.082..27.594 rows=28228 loops=1)
-> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1)
Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text))
-> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text))
-> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.061..18.655 rows=28228 loops=1)
Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text))
-> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.004..0.004 rows=1 loops=28228)
Index Cond: (id = compdetails.event_id)
Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[]))
-> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=210.100..3026.255 rows=50 loops=1)
Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[]))))
SubPlan 2
-> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=206.998..207.071 rows=50 loops=1)
SubPlan 3
-> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.026 rows=50 loops=1)
Total runtime: 3026.410 ms
查詢計劃如下:對於companyId有288個transactionIds:
Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=30.976..3790.362 rows=54 loops=1)
CTE companyevents
-> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=9.263..9.290 rows=50 loops=1)
-> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=9.263..9.272 rows=50 loops=1)
Sort Key: events1.datetime
Sort Method: top-N heapsort Memory: 24kB
-> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.071..8.195 rows=1025 loops=1)
-> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.060..1.348 rows=1025 loops=1)
-> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1)
Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text))
-> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text))
-> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.039..1.006 rows=1025 loops=1)
Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text))
-> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.005..0.006 rows=1 loops=1025)
Index Cond: (id = compdetails.event_id)
Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[]))
-> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=30.975..3790.332 rows=54 loops=1)
Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[]))))
SubPlan 2
-> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=9.266..9.327 rows=50 loops=1)
SubPlan 3
-> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.019 rows=50 loops=1)
Total runtime: 3796.736 ms
隨着3S/4S,這不是壞的,但仍然是一個因素100+太慢。另外,這不是在相關的硬件上。儘管如此,它應該顯示疼痛在哪裏。
這裏是有可能有可能成長爲一個解決方案:
新增的表:
events_transaction_helper
(
event_id bigint not null
transactionid character varying(36) not null
keyname character varying(255) not null
value bigint not null
# index on keyname, value
)
我「手動」現在充滿此表,但物化視圖實現會做招。這將多少按照下面的查詢:
SELECT tr.event_id, tr.value AS transactionid, det.keyname, det.value AS value
FROM events_eventdetails tr
JOIN events_eventdetails det ON det.event_id = tr.event_id
WHERE tr.keyname = 'transactionId'
AND det.keyname
IN ('companyId', 'reduction_id', 'customer_id');
添加了一個列到events_events表:
transaction_id character varying(36) null
這種新列充滿如下:
update events_events
set transaction_id =
(select value from events_eventdetails
where keyname='transactionId'
and event_id=events_events.id);
現在,下面的查詢返回< 15ms始終如一:
explain analyze select * from events_events
where transactionId in
(select distinct transactionid
from events_transaction_helper
WHERE keyname='companyId' and value=5)
and eventtype_id in (...)
order by datetime desc limit 250;
Limit (cost=5075.23..5075.85 rows=250 width=130) (actual time=8.901..9.028 rows=250 loops=1)
-> Sort (cost=5075.23..5077.19 rows=785 width=130) (actual time=8.900..8.953 rows=250 loops=1)
Sort Key: events_events.datetime
Sort Method: top-N heapsort Memory: 81kB
-> Nested Loop (cost=57.95..5040.04 rows=785 width=130) (actual time=0.928..8.268 rows=524 loops=1)
-> HashAggregate (cost=52.30..52.42 rows=12 width=37) (actual time=0.895..0.991 rows=276 loops=1)
-> Subquery Scan on "ANY_subquery" (cost=52.03..52.27 rows=12 width=37) (actual time=0.558..0.757 rows=276 loops=1)
-> HashAggregate (cost=52.03..52.15 rows=12 width=37) (actual time=0.556..0.638 rows=276 loops=1)
-> Index Scan using testmaterializedviewkeynamevalue on events_transaction_helper (cost=0.00..51.98 rows=22 width=37) (actual time=0.068..0.404 rows=288 loops=1)
Index Cond: (((keyname)::text = 'companyId'::text) AND (value = 5))
-> Bitmap Heap Scan on events_events (cost=5.65..414.38 rows=100 width=130) (actual time=0.023..0.024 rows=2 loops=276)
Recheck Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text)
Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[]))
-> Bitmap Index Scan on testtransactionid (cost=0.00..5.63 rows=100 width=0) (actual time=0.020..0.020 rows=2 loops=276)
Index Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text)
Total runtime: 9.122 ms
我稍後再回來看看,讓你知道,如果這變成了一個可行的解決方案真正:)
請耐心使用這些查詢,以便我們知道您已經嘗試了些什麼。 – 2012-04-10 23:17:21
所以你的事件細節條件是varchar 255和一個文本字段?你應該輕拍自己的背部,將其降低到10秒。 Event_Keys table int和varchar來標準化它們,索引將是一個開始,但文本字段是一個問題,雖然... – 2012-04-10 23:23:35
@Ben:發佈這些查詢將使我的評論兩長。不知道如何去解決這個問題。 – 2012-04-10 23:52:38