3

我有一個數據集,我想解析它以查看多點觸控歸因。數據集由領導者組成,他們對市場營銷活動及其營銷來源做出了迴應。Redshift PostgreSQL與衆不同ON運算符

每個潛在客戶都可以響應多個廣告系列,並且我希望他們的第一個營銷來源和他們最後的營銷來源位於同一個表格中。

我在想我可以創建兩個表並使用兩者的select語句。 第一個表格將嘗試創建一個表格,其中包含每個人最近的營銷來源(使用電子郵件作爲其唯一ID)。

create table temp.multitouch1 as (
select distinct on (email) email, date, market_source as last_source 
from sf.campaignmember 
where date >= '1/1/2016' ORDER BY DATE DESC); 

然後我會創建一個帶有已刪除電子郵件的表格,但這次是第一個來源。

create table temp.multitouch2 as (
select distinct on (email) email, date, market_source as first_source 
from sf.campaignmember 
where date >= '1/1/2016' ORDER BY DATE ASC); 

最後,我想簡單地選擇電子郵件,並加入第一個和最後的市場人士將其各自在自己的專欄。

select a.email, a.last_source, b.first_source, a.date 
from temp.multitouch1 a 
left join temp.multitouch b on b.email = a.email 

因爲截然不同的工作在redshift的postgresql版本上我希望有人有一個想法來解決這個問題的另一種方式。

編輯2/22:更多的上下文我正在處理他們已經回覆的人員和活動。每條記錄都是一個「廣告系列響應」,每個人都可以擁有多個廣告系列的多個廣告系列響應。我正在嘗試製作一個選擇性聲明,該聲明將根據個人情況進行重複數據刪除,然後針對他們已回覆的第一個廣告系列/營銷來源以及他們分別回覆的最後一個廣告系列/營銷來源分列專欄

編輯2/24:理想的輸出是一個4列的表:電子郵件,last_source,first_source,日期。

對於只有1個活動成員記錄的人員,第一個和最後一個來源列是相同的,對於擁有超過1個活動成員記錄的所有人而言,第一個和最後一個來源列是相同的。

+0

您確定您使用的是'postgresql-8.0'嗎? –

+0

根據AWS文檔中的這個頁面,我是:http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html – Berra2k

回答

2

我相信你可以使用CASE表達式內ROW_NUMBER()這樣的:

SELECT 
     email 
    , MIN(first_source) AS first_source 
    , MIN(date) first_date 
    , MAX(last_source) AS last_source 
    , MAX(date) AS last_date 
FROM (
     SELECT 
      email 
      , date 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source 
        ELSE NULL 
      END AS first_source 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source 
        ELSE NULL 
      END AS last_source 
     FROM sf.campaignmember 
     WHERE date >= '2016-01-01' 
    ) s 
WHERE first_source IS NOT NULL 
     OR last_source IS NOT NULL 
GROUP BY 
     email 

這裏測試:SQL Fiddle

PostgreSQL的9。3架構設置

CREATE TABLE campaignmember 
    (email varchar(3), date timestamp, market_source varchar(1)) 
; 

INSERT INTO campaignmember 
    (email, date, market_source) 
VALUES 
    ('[email protected]', '2016-01-02 00:00:00', 'x'), 
    ('[email protected]', '2016-01-03 00:00:00', 'y'), 
    ('[email protected]', '2016-01-04 00:00:00', 'z'), 
    ('[email protected]', '2016-01-02 00:00:00', 'x') 
; 

查詢1

SELECT 
     email 
    , MIN(first_source) AS first_source 
    , MIN(date) first_date 
    , MAX(last_source) AS last_source 
    , MAX(date) AS last_date 
FROM (
     SELECT 
      email 
      , date 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source 
        ELSE NULL 
      END AS first_source 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source 
        ELSE NULL 
      END AS last_source 
     FROM campaignmember 
     WHERE date >= '2016-01-01' 
    ) s 
WHERE first_source IS NOT NULL 
     OR last_source IS NOT NULL 
GROUP BY 
     email 

Results

| email | first_source |    first_date | last_source |     last_date | 
|-------|--------------|---------------------------|-------------|---------------------------| 
| [email protected] |   x | January, 02 2016 00:00:00 |   z | January, 04 2016 00:00:00 | 
| [email protected] |   x | January, 02 2016 00:00:00 |   x | January, 02 2016 00:00:00 | 

&的小延伸於該請求,計數的接觸點的數量。

SELECT 
     email 
    , MIN(first_source) AS first_source 
    , MIN(date) first_date 
    , MAX(last_source) AS last_source 
    , MAX(date) AS last_date 
    , MAX(numof) AS Numberof_Contacts 
FROM (
     SELECT 
      email 
      , date 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source 
        ELSE NULL 
      END AS first_source 
      , CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source 
        ELSE NULL 
      END AS last_source 
      , COUNT(*) OVER (PARTITION BY email) as numof 
     FROM campaignmember 
     WHERE date >= '2016-01-01' 
    ) s 
WHERE first_source IS NOT NULL 
     OR last_source IS NOT NULL 
GROUP BY 
     email 
+0

完美的工作!非常感謝! – Berra2k

0

您可以使用良好的舊左連接羣組最大值。

SELECT DISTINCT c1.email, c1.date, c1.market_source 
FROM sf.campaignmember c1 
    LEFT JOIN sf.campaignmember c2 
    ON c1.email = c2.email AND c1.date > c2.date AND c1.id > c2.id 
    LEFT JOIN sf.campaignmember c3 
    ON c1.email = c3.email AND c1.date < c3.date AND c1.id > c3.id 
WHERE c1.date >= '1/1/2016' AND c2.date >= '1/1/2016' 
     AND (c2.email IS NULL OR c3.email IS NULL) 

這假設你有一個唯一的ID列,如果(日期,電子郵件)是唯一的ID是不需要的。

+0

難道我不想選擇c2.market_source和那麼c3.market_source而不是僅僅使用c1.market_source?另外麻煩的是,有些人有多個market_source記錄,因爲他們已經對多個廣告系列做出了迴應,而其他人則沒有。 – Berra2k

+0

@ Berra2k您想從c1中選擇沒有較舊記錄的記錄,然後選擇它們(c2)或沒有較新記錄(c3)。如果電子郵件只有一條記錄,那麼它仍然會被退回。請嘗試看看。 –