2012-07-11 96 views
2

我一直在對大小爲56GB的表(789700760行)運行以下查詢,並且在執行時間內遇到瓶頸。從我之前的一些例子中我可以看出,可能有一種方法可以'嵌套'INNER JOIN,以便查詢對大型數據集執行更好。特別是下面的查詢花了7.651小時完成MPP PostgreSQL部署的執行。爲大型Postgresql表優化嵌套連接窗口函數

create table large_table as 
select column1, column2, column3, column4, column5, column6 
from 
(
    select 
    a.column1, a.column2, a.start_time, 
    rank() OVER( 
     PARTITION BY a.column2, a.column1 order by a.start_time DESC 
    ) as rank, 
    last_value(a.column3) OVER (
     PARTITION BY a.column2, a.column1 order by a.start_time ASC 
     RANGE BETWEEN unbounded preceding and unbounded following 
    ) as column3, 
    a.column4, a.column5, a.column6 
    from 
    (table2 s 
     INNER JOIN table3 t 
     ON s.column2=t.column2 and s.event_time > t.start_time 
    ) a 
) b 
where rank =1; 

Question 1: Is there a way to modify the above sql code to speed up the overall execution time of the query?

+0

如果rank爲每個column2,column1組合僅返回一行,則last_value()似乎是多餘的。你期待多行嗎?否則,rank = 1的column3中的值應與計算值相同。 – 2012-07-11 18:45:30

回答

1

您可以將LAST_VALUE移動到外的子查詢,這可能會買你的表現有所改善。該LAST_VALUE是越來越值欄3的每個地方,開始時間爲最小的分區 - 這正是在秩= 1:

select column1, column2, 
     ast_value(a.column3) OVER (PARTITION BY column2, column1 order by start_time ASC 
            RANGE BETWEEN unbounded preceding and unbounded following 
           ) as column3, 
     column4, column5, column6 
from (select a.column1, a.column2, a.start_time, 
      rank() OVER (PARTITION BY a.column2, a.column1 order by a.start_time DESC 
         ) as rank, 
      a.column3, a.column4, a.column5, a.column6 
     from (table2 s INNER JOIN 
      table3 t 
      ON s.column2 = t.column2 and s.event_time > t.start_time 
      ) a 
    ) b 
where rank = 1 

否則,你需要給在執行計劃和表2和表3,以瞭解更多信息獲得更多幫助。

+0

感謝您的幫助我正在測試更新查詢的時間,但是當我使用last_value(a.column3)時,遇到了一個小問題,給出的錯誤是ERROR:缺少表「a」的FROM-clause條目。我用last_value(column3)取代了這個命令,這是否仍然有效? – user7980 2012-07-11 23:30:05