我一直在對大小爲56GB的表(789700760行)運行以下查詢,並且在執行時間內遇到瓶頸。從我之前的一些例子中我可以看出,可能有一種方法可以'嵌套'INNER JOIN,以便查詢對大型數據集執行更好。特別是下面的查詢花了7.651小時完成MPP PostgreSQL部署的執行。爲大型Postgresql表優化嵌套連接窗口函數
create table large_table as
select column1, column2, column3, column4, column5, column6
from
(
select
a.column1, a.column2, a.start_time,
rank() OVER(
PARTITION BY a.column2, a.column1 order by a.start_time DESC
) as rank,
last_value(a.column3) OVER (
PARTITION BY a.column2, a.column1 order by a.start_time ASC
RANGE BETWEEN unbounded preceding and unbounded following
) as column3,
a.column4, a.column5, a.column6
from
(table2 s
INNER JOIN table3 t
ON s.column2=t.column2 and s.event_time > t.start_time
) a
) b
where rank =1;
Question 1: Is there a way to modify the above sql code to speed up the overall execution time of the query?
如果rank爲每個column2,column1組合僅返回一行,則last_value()似乎是多餘的。你期待多行嗎?否則,rank = 1的column3中的值應與計算值相同。 – 2012-07-11 18:45:30