2016-06-16 107 views
0

我在以下格式的netezza的web_event表中有一些數據。Netezza排除特定記錄

vstr_id | sessn_id | sessn_ts   | wbpg_nm 
V1  | V1S1  | 02-02-2015 09:20:00 | /home/login 
V1  | V1S1  | 02-02-2015 09:22:00 | -1 
V1  | V1S1  | 02-02-2015 09:30:00 | /home/contacts 
V1  | V1S1  | 02-02-2015 09:32:00 | -1 
V1  | V1S1  | 02-02-2015 09:50:00 | /home/search 
V1  | V1S1  | 02-02-2015 09:55:00 | -1 
V2  | V2S1  | 02-02-2015 09:10:00 | /home 
V2  | V2S1  | 02-02-2015 09:15:00 | /home/apps 
V2  | V2S2  | 02-02-2015 09:20:00 | /home/news 
V2  | V2S2  | 02-02-2015 09:23:00 | /home/news/internal 

這是我的源表。

我想使用該web_event表並創建像下面這樣的另一個表。

我想要像下面那樣加載sessn_durtn表和time_on_pg表。

1)time_on_page列:它是當前頁面和下一頁面加載之間的時間差,如果沒有其他事件或頁面加載,則會話的最後一頁可以有0秒。它可以用幾分鐘或幾秒來表示。

Insert into time_on_pg (select VSTR_ID, 
      SESSN_ID, 
      sessn_ts, 
      WBPG_NM, 
      ????? as time_on_page 
      from web_event) 

vstr_id | sessn_id | sessn_ts   | wbpg_nm    | wanted_time_on_page | currently_known_time_on_page 
V1  | V1S1  | 02-02-2015 09:20:00 | /home/login   | 10mins    | 2mins 
V1  | V1S1  | 02-02-2015 09:22:00 | -1     |      | 8mins 
V1  | V1S1  | 02-02-2015 09:30:00 | /home/contacts  | 20mins    | 2mins 
V1  | V1S1  | 02-02-2015 09:32:00 | -1     |      | 18mins 
V1  | V1S1  | 02-02-2015 09:50:00 | /home/search   | 5mins    | 5mins 
V1  | V1S1  | 02-02-2015 09:55:00 | -1     |      | 

V2  | V2S1  | 02-02-2015 09:10:00 | /home    | 5mins    | 5mins 
V2  | V2S1  | 02-02-2015 09:15:00 | /home/apps   |      | 

V2  | V2S2  | 02-02-2015 09:20:00 | /home/news   | 3mins    | 3mins 
V2  | V2S2  | 02-02-2015 09:23:00 | /home/news/internal |      | 

我們怎樣才能在Netezza公司或任何SQL查詢做到這一點?

我必須計算使用

SELECT vstr_id, 
    sessn_id, 
    sessn_ts, 
    wbpg_nm, 
    ???????? AS wanted_time_on_page, 
    extract(epoch from (lag(event_ts) over (partition by vstr_id, sessn_id order by event_ts DESC) - event_ts)) AS currently_known_time_on_page 
    from web_event; 

wanted_time_on_page和currently_known_time_on_page之間的主要差別的currently_known_time_on_page的邏輯被消除「-1」的網頁而計算除了最後一頁的時間差。

回答

2

我不知道如何大數據集,以及如何你有很多可用的RAM。這個查詢是在內存中完成的。您可以將每個單獨的CTE轉換爲臨時表以獲得速度。

WITH CTE_SessionOrder AS (
SELECT 
    sessn_id 
    ,sessn_ts  
    ,wbpg_nm 
    ,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts DESC) AS RowNum -- This is sorted Desc to get last row 
FROM 
    web_event 
) 
,CTE_KeepLastRow AS (
SELECT * 
FROM 
    CTE_SessionOrder 
WHERE 
    RowNum = 1 
    AND wbpg_nm = '-1' 
) 
,CTE_OtherRows AS (
SELECT * 
FROM 
    CTE_SessionOrder 
WHERE 
    wbpg_nm != '-1' 
) 
,CTE_FilteredData AS (
SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_KeepLastRow 
UNION 
SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_OtherRows 
) 
,CTE_FilterOrderedData AS (
SELECT 
    * 
    ,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts) AS RowNum -- Now Ordered Asc 
FROM 
    CTE_FilteredData 
) 
,CTE_FinalData AS (
SELECT 
    D1.sessn_id 
    ,D1.sessn_ts  
    ,D1.wbpg_nm 
    ,DATEDIFF(mi,D1.sessn_ts,D2.sessn_ts) time_on_page 
FROM 
    CTE_FilterOrderedData D1 
    LEFT JOIN CTE_FilterOrderedData D2 
     ON D1.sessn_id = D2.sessn_id 
      AND D1.RowNum + 1 = D2.RowNum 
UNION 
SELECT 
    sessn_id 
    ,sessn_ts  
    ,wbpg_nm 
    ,CAST(NULL AS INT) time_on_page 
FROM 
    CTE_SessionOrder 
WHERE 
    RowNum != 1 
    AND wbpg_nm = '-1' 
) 
SELECT * 
FROM 
    CTE_FinalData 
+0

Arleigh您提供的結果集只保留1'-1'結果在您的答案中有3個在他的結果集和起始表 – Matt

+0

沒有意識到你也想要。更新了代碼以包含它。謝謝。 –

+0

我想弄清楚爲什麼瘋狂的長篇文章,然後我意識到我沒有拿起最後的-1,我在外部應用中錯過了,我會調整外部應用,但我只是把一個CTE您可能想要查看使用2行號2個自引用並在您的ROW_NUMBER函數中調整PARTITION BY可以幫助您更快地獲得所需的結果。 – Matt

1

我認爲event_ts和sessn_ts是一樣的?無論如何,這裏是一個查詢,應該爲你工作,它使用OUTER APPLY技術來查找(> sessn_ts)不是網頁-1後表中的結果,然後得到最高結果的升序。

只需將表名改爲您的表。

這是一個主要使用outer apply的解決方案,但也使用公用表表達式(cte)來設置所需的最後'-1'的時間。

;WITH cteMaxNeg1 AS (
    SELECT 
     sessn_id 
     ,MaxNeg1SessnTs = MAX(CASE WHEN we.webpg_nm = '-1' THEN we.sessn_ts ELSE NULL END) 
     ,MaxPageSessnTs = MAX(CASE WHEN we.webpg_nm <> '-1' THEN we.sessn_ts ELSE NULL END) 
    FROM 
     @WebEvents we 
    GROUP BY 
     sessn_id 
) 

SELECT 
    we.* 
    ,currently_known_time_on_page = ISNULL(LAG(we.sessn_ts) over (partition by we.vstr_id, we.sessn_id order by we.sessn_ts DESC) - we.sessn_ts,CAST(0 AS DATETIME)) 
    ,WantedTimeOnPage = CASE 
     WHEN we.sessn_ts = m.MaxPageSessnTs AND we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,m.MaxNeg1SessnTs) 
     WHEN we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,o.sessn_ts) 
     ELSE NULL 
    END 
FROM 
    @WebEvents we 
    LEFT JOIN cteMaxNeg1 m 
    ON we.sessn_id = m.sessn_id 
    OUTER APPLY (
     SELECT TOP 1sessn_ts 
     FROM 
      @WebEvents i 
     WHERE 
      i.webpg_nm <> '-1' 
      AND i.sessn_id = we.sessn_id 
      AND i.sessn_ts > we.sessn_ts 

     ORDER BY 
      i.sessn_ts ASC 

    ) o 
ORDER BY 
    we.sessn_id 
    ,we.sessn_ts 

這裏只是使用CTE和窗口功能

;WITH cte AS (
    SELECT 
     * 
     ,RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts) 
     ,LastNeg1RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts DESC) 
    FROM 
     @WebEvents 
) 

SELECT 
    c1.* 
    ,WantedTimeOnPage = CASE 
     WHEN c1.LastNeg1RowNum = 1 AND c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c3.sessn_ts) 
     WHEN c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c2.sessn_ts) 
     ELSE NULL 
    END 
FROM 
    cte c1 
    LEFT JOIN cte c2 
    ON c1.sessn_id = c2.sessn_id 
    AND (c1.RowNum + 1) = c2.RowNum 
    AND c2.webpg_nm <> '-1' 
    LEFT JOIN cte c3 
    ON c1.sessn_id = c3.sessn_id 
    AND c3.LastNeg1RowNum = 1 
    AND c3.webpg_nm = '-1' 
ORDER BY 
    c1.sessn_id 
    ,c1.sessn_ts 

測試數據我從你使用的解決方案:

DECLARE @WebEvents AS TABLE (vstr_id CHAR(2), sessn_id CHAR(5), sessn_ts DATETIME, webpg_nm VARCHAR(100)) 

INSERT INTO @WebEvents (vstr_id, sessn_id, sessn_ts, webpg_nm) 
VALUES 
('V1','V1S1','02-02-2015 09:20:00','/home/login') 
,('V1','V1S1','02-02-2015 09:22:00','-1') 
,('V1','V1S1','02-02-2015 09:30:00','/home/contacts') 
,('V1','V1S1','02-02-2015 09:32:00','-1') 
,('V1','V1S1','02-02-2015 09:50:00','/home/search') 
,('V1','V1S1','02-02-2015 09:55:00','-1') 
,('V2','V2S1','02-02-2015 09:10:00','/home') 
,('V2','V2S1','02-02-2015 09:15:00','/home/apps') 
,('V2','V2S2','02-02-2015 09:20:00','/home/news') 
,('V2','V2S2','02-02-2015 09:23:00','/home/news/internal') 
+0

謝謝馬特,這會有所幫助。 – RAJESH

+0

@RAJESH歡迎您 – Matt