2017-08-15 63 views
1

爲每個用戶創建一個包含多種UUID類型的事件表,我們希望能夠將所有這些UUID拼接在一起,以獲得單個用戶的最高可能定義。在紅移中查找具有多個UUID標識符的唯一實體

例如:

UUID1 | UUID2 
    1  a 
    1  a 
    2  a 
    2  b 
    3  c 
    4  c 

有2個用戶在這裏,第一個與UUID1 = {1,2}和UUID2 = {A,B},第二個與UUID1 = {3,4- }和uuid2 = {c}。這些連鎖店可能會很長。沒有交集(即1c不存在),所有行都是時間戳排序的。

有沒有辦法在redshift中生成這些唯一的「訪客」標識符而不用創建一個包含許多連接的龐大查詢?

在此先感謝!

+1

可以鏈可能相交?像第一個用戶的「1c」條目... 並且是否有時間戳列來排序鏈中的事件? – AlexYes

+0

謝謝你的問題:分別不,是的。編輯該問題以進行澄清。 – user3136936

回答

0

創建測試數據表

-- DROP TABLE uuid_test; 
CREATE TEMP TABLE uuid_test AS 
      SELECT 1 row_id, 1::int uuid1, 'a'::char(1) uuid2 
UNION ALL SELECT 2 row_id, 1::int uuid1, 'a'::char(1) uuid2 
UNION ALL SELECT 3 row_id, 2::int uuid1, 'a'::char(1) uuid2 
UNION ALL SELECT 4 row_id, 2::int uuid1, 'b'::char(1) uuid2 
UNION ALL SELECT 5 row_id, 3::int uuid1, 'c'::char(1) uuid2 
UNION ALL SELECT 6 row_id, 4::int uuid1, 'c'::char(1) uuid2 
UNION ALL SELECT 7 row_id, 4::int uuid1, 'd'::char(1) uuid2 
UNION ALL SELECT 8 row_id, 5::int uuid1, 'e'::char(1) uuid2 
UNION ALL SELECT 9 row_id, 6::int uuid1, 'e'::char(1) uuid2 
UNION ALL SELECT 10 row_id, 6::int uuid1, 'f'::char(1) uuid2 
UNION ALL SELECT 11 row_id, 7::int uuid1, 'f'::char(1) uuid2 
UNION ALL SELECT 12 row_id, 8::int uuid1, 'g'::char(1) uuid2 
UNION ALL SELECT 13 row_id, 8::int uuid1, 'h'::char(1) uuid2 
; 

的實際問題通過使用嚴格的排序,以找到每一個地方,獨特的用戶變化,捕捉,作爲一個查找表,然後將其應用到原始數據解決。

-- Create lookup table with a from-to range of IDs for each unique user 
WITH unique_user AS (

-- Calculate the end of the id range using LEAD() to look ahead 
-- Use an inline MAX() to find the ending ID for the last entry 

SELECT row_id AS from_id 
    , NVL(LEAD(row_id,1) OVER (ORDER BY row_id)-1, (SELECT MAX(row_id) FROM uuid_test)) AS to_id 
    , unique_uuid 

-- Mark unique user change when there is discontinuity in either UUID 
FROM (SELECT row_id 
      ,CASE WHEN NVL(LAG(uuid1,1) OVER (ORDER BY row_id), 0) <> uuid1 
        AND NVL(LAG(uuid2,1) OVER (ORDER BY row_id), '') <> uuid2 
      THEN MD5(uuid1||uuid2) 
      ELSE NULL END unique_uuid 
     FROM uuid_test) t 
WHERE unique_uuid IS NOT NULL 
ORDER BY row_id 
) 

-- Apply the unique user value to each row using a range join to the lookup table 
SELECT a.row_id, a.uuid1, a.uuid2, b.unique_uuid 
FROM uuid_test AS a 
JOIN unique_user AS b 
    ON a.row_id BETWEEN b.from_id AND b.to_id 
ORDER BY a.row_id 
; 

下面是輸出

row_id | uuid1 | uuid2 |   unique_uuid 
--------+-------+-------+---------------------------------- 
     1 |  1 | a  | efaa153b0f682ae5170a3184fa0df28c 
     2 |  1 | a  | efaa153b0f682ae5170a3184fa0df28c 
     3 |  2 | a  | efaa153b0f682ae5170a3184fa0df28c 
     4 |  2 | b  | efaa153b0f682ae5170a3184fa0df28c 
     5 |  3 | c  | 5fcfcb7df376059d0075cb892b2cc37f 
     6 |  4 | c  | 5fcfcb7df376059d0075cb892b2cc37f 
     7 |  4 | d  | 5fcfcb7df376059d0075cb892b2cc37f 
     8 |  5 | e  | 18a368e1052b5aa0388ef020dd9a1e20 
     9 |  6 | e  | 18a368e1052b5aa0388ef020dd9a1e20 
    10 |  6 | f  | 18a368e1052b5aa0388ef020dd9a1e20 
    11 |  7 | f  | 18a368e1052b5aa0388ef020dd9a1e20 
    12 |  8 | g  | 321fcc2447163a81d470b9353e394121 
    13 |  8 | h  | 321fcc2447163a81d470b9353e394121