2015-10-27 24 views
1

定製滯後偏移我有大量查詢表如下:大量查詢 - 在使用滯後功能

date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time hits_eventInfo_Label 

20151021 Air Search 1445001 A232 1952 City1 
20151021 Air Select 1445001 A232 2300 Vendor1 
20151021 Air Search 1445001 A111 1000 City2 
20151021 Air Search 1445001 A111 1900 City3 
20151021 Air Select 1445001 A111 7380 Vendor2 
20151021 Air Search 1445001 A580 1000 City4 
20151021 Air Search 1445001 A580 1900 City5 
20151021 Air Search 1445001 A580 1900 City6 
20151021 Air Select 1445001 A580 7380 Vendor3 

表顯示的用戶活動爲3層的用戶 - A232,A111和A580,使得:

i) A232 - Made 1 Search at 'City1' and chose 'Vendor1' from 'City1' 
ii) A111 - Made the 1st search at 'City2' and did not choose any vendor from there. Made a 2nd search at 'City3' and then ultimately chose a 'Vendor2' from here. 
iii) A580 - 1st search at 'City4', no vendor chosen. 2nd search at 'City5', no vendor chosen. 3rd search at 'City6', 'Vendor3' chosen from City6. 

我對僅檢索用戶實際選擇供應商的城市感興趣,即對用戶以前進行的搜索沒有興趣,而這些搜索沒有導致選擇供應商。

需要的輸出表:

date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time city vendor 

20151021 Air Search 1445001 A232 1952 City1 Vendor1 
20151021 Air Search 1445001 A111 1900 City3 Vendor2 
20151021 Air Search 1445001 A580 1900 City6 Vendor3 

我一直在努力,因爲我用這用在hits_eventInfo_eventLabel領域的LAG功能上USER_ID和排序由hits_time即LAG(hits_eventInfo_eventLabel,1) OVER(PARTITION BY user_id ORDER BY hits_time)

然而,分割後做我的滯後偏移量爲1,上述表達式僅幫助我爲用戶A232獲得所需的輸出(因爲他只進行了1次搜索,這意味着在選擇供應商之前的前一個記錄肯定是搜索記錄)。

有沒有一種方法可以讓這個滯後表達更具動態性,使得它在進行選擇之前僅檢索搜索到的直接位置 - 無論在進行選擇之前進行了多少次搜索?

OR

有沒有我可以實現這個替代函數/路線?

回答

1
select 
    date, 
    hits_eventInfo_Category, 
    hits_eventInfo_Action, 
    session_id, 
    user_id, 
    hits_time, 
    prev as city, 
    hits_eventInfo_Label as vendor 
from (
    select *, 
    lag(hits_eventInfo_Label, 1) over(partition by user_id order by hits_time) as prev 
    from dataset.table 
) 
where hits_eventInfo_Action = 'Select'