0
我正在嘗試創建一個新表,它是具有匹配主鍵的其他6個表的聚合總和。這一直拖延,如果我使用3個以上的輸入表:2-3表,但攤位上運行,否則當該腳本較快(< 5秒):Postgresql CREATE TABLE AS具有多個WHERE等號
CREATE TABLE table_name AS SELECT table1.timestamp, table1.value + table2.value + table3.value + table4.value AS value FROM table1, table2, table3, table4 WHERE table1.timestamp=table2.timestamp AND table2.timestamp=table3.timestamp AND table3.timestamp=table4.timestamp;
問題。無論如何,我還沒有嘗試過超過5分鐘的時間,但這對我的目的來說太慢了。
表格說明:每個表格有6列的相同格式(其中2個是相關的)。主鍵是一個整數「timestamp」,「value」是一個實數。表格大小各不相同,但每個表格的懸停大約爲100k行/條目。這些表大多具有相同的主鍵,但每個表中缺少一些數據點,因此從新表中省略這些數據點至關重要。
有沒有什麼我做錯了,我該怎麼做才能讓它跑得快?
編輯:
PS:這裏是一個完整的 「EXPLAIN ANALYZE」 查詢的實際輸出:
eldb=# EXPLAIN ANALYZE CREATE TABLE test_table AS SELECT count1.timestamp, count
1.year, count1.month, count1.day, count1.period, count1.the_value + count2.the_value + count
3.the_value + count4.the_value + count5.the_value + count6.the_value AS the_value FROM "table_name-1" AS count
1, "table_name-2" AS count2, "table_name-3" AS count3, "table_name-4" AS count4,
"table_name-5" AS count5, "table_name-6" AS count6 WHERE count1.timestamp=count
2.timestamp AND count2.timestamp=count3.timestamp AND count3.timestamp=count4.ti
mestamp AND count4.timestamp=count5.timestamp AND count5.timestamp=count6.timest
amp AND count1.timestamp>2012020000 AND count2.timestamp>2012020000 AND count3.t
imestamp>2012020000 AND count4.timestamp>2012020000 and count5.timestamp>2012020
000 AND count6.timestamp>2012020000;
QUERY
PLAN
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
Merge Join (cost=20323.61..153806457715456.50 rows=5592655588099248 width=44)
(actual time=84.524..3310.692 rows=3410 loops=1)
Merge Cond: (count1."timestamp" = count4."timestamp")
-> Nested Loop (cost=10161.80..4417379579.26 rows=1057606343 width=40) (act
ual time=44.597..1616.585 rows=3410 loops=1)
Join Filter: (count2."timestamp" = count1."timestamp")
-> Merge Join (cost=10161.80..101480.96 rows=6070522 width=16) (actua
l time=43.648..48.950 rows=3410 loops=1)
Merge Cond: (count2."timestamp" = count3."timestamp")
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actual time
=25.608..25.804 rows=3410 loops=1)
Sort Key: count2."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-2" count2 (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.064..23.297 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=5080.90..5255.12 rows=34844 width=8) (actu
al time=18.030..19.847 rows=3410 loops=1)
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.023..18.416 rows=3410 loops=1)
Sort Key: count3."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-3" count3 (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.023..16.294 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=0.00..2351.88 rows=34844 width=24) (actual time=
0.000..0.147 rows=3410 loops=3410)
-> Seq Scan on "table_name-1" count1 (cost=0.00..1972.66 rows=3
4844 width=24) (actual time=0.020..16.853 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=10161.80..4007228099.11 rows=1057606343 width=24) (act
ual time=39.917..1687.402 rows=3410 loops=1)
-> Nested Loop (cost=10161.80..4004584083.26 rows=1057606343 width=24
) (actual time=39.915..1685.956 rows=3410 loops=1)
Join Filter: (count4."timestamp" = count6."timestamp")
-> Merge Join (cost=10161.80..101480.96 rows=6070522 width=16)
(actual time=38.689..44.309 rows=3410 loops=1)
Merge Cond: (count4."timestamp" = count5."timestamp")
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.960..19.156 rows=3410 loops=1)
Sort Key: count4."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-4" count4 (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.059..17.271 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=5080.90..5255.12 rows=34844 width=8)
(actual time=19.717..21.826 rows=3410 loops=1)
-> Sort (cost=5080.90..5168.01 rows=34844 width=8)
(actual time=19.708..20.266 rows=3410 loops=1)
Sort Key: count5."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-5" count5 (cost=0.
00..1972.66 rows=34844 width=8) (actual time=0.034..18.001 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=0.00..2283.88 rows=34844 width=8) (actual
time=0.000..0.148 rows=3410 loops=3410)
-> Seq Scan on "table_name-6" count6 (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.036..17.785 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
Total runtime: 3330.933 ms
(40 rows)
這裏是表結構(同樣爲所有表):
CREATE TABLE "table_name-6"
(
"timestamp" integer NOT NULL,
year integer NOT NULL,
month integer NOT NULL,
day integer NOT NULL,
period integer NOT NULL,
the_value real,
CONSTRAINT "table_name-6_pkey" PRIMARY KEY ("timestamp")
)
注意:實際的表名和值被重命名。而且,這個輸出只是實際表格大小的一小部分。
你想什麼如果一個特定的密鑰只存在於四個表中的一箇中,會發生什麼? – wildplasser
我不希望該密鑰被包含在新表中(即完全跳過)。 (ps:感謝您的快速響應!) – TimY
時間戳是每個tableX的主鍵?你有索引嗎? BTW「時間戳」是PG中的保留字(類型)。最好避免它們作爲標識符。順便說一句:請添加一個查詢計劃。您可以在查詢前加上「解釋分析」來獲得。 – wildplasser