2016-06-08 87 views
1

我有一個表,其中包含名爲fact_interactions的客戶交互的運行歷史記錄。每次聯繫客戶時,都會創建一條新記錄,其中包含有關交互的特定詳細信息。這裏有一個例子:計算具有多個條件的列的許多不同組合

inter_id |customer_id |business_id |department_id |datetime_local  |outcome_id | 
---------|------------|------------|--------------|--------------------|-----------| 
46032383 |1   |112   |1916   |2015-01-14 19:54:20 |48   | 
55740863 |2   |2   |3358   |2015-05-06 12:02:12 |19   | 
49512895 |3   |160   |396   |2015-01-22 11:57:17 |19   | 
51822751 |3   |160   |396   |2015-01-28 13:46:19 |19   | 
23533190 |4   |132   |425   |2015-03-26 12:42:24 |19   | 
69354240 |5   |164   |3061   |2015-03-30 11:01:43 |19   | 
61417848 |5   |164   |3061   |2015-04-01 14:36:30 |19   | 
74948424 |5   |164   |3061   |2015-04-28 15:12:42 |19   | 
75303296 |5   |164   |3061   |2015-04-29 13:51:02 |10   | 
76071776 |5   |164   |3061   |2015-05-01 09:18:39 |10   | 

對於每一個記錄,我需要找到所有由多個條件匹配,在多個時間窗口的行。這裏是我的查詢與幾個目前我使用的是不同的子查詢的示例:

SELECT 
    inter_id, 
    (SELECT COUNT(*) FROM fact_interactions B 
     WHERE B.customer_id = A.customer_id 
     AND B.business_id = A.business_id 
     AND B.department_id = A.department_id 
     AND B.datetime_local::date = A.datetime_local::date 
     AND B.datetime_local < A.datetime_local) AS cnt_samesamesame_day0 
    (SELECT COUNT(*) FROM fact_interactions B 
     WHERE B.customer_id = A.customer_id 
     AND B.business_id = A.business_id 
     AND B.department_id <> A.department_id 
     AND B.datetime_local::date = A.datetime_local::date 
     AND B.datetime_local < A.datetime_local) AS cnt_samesamediff_day0 
    (SELECT COUNT(*) FROM fact_interactions B 
     WHERE B.customer_id = A.customer_id 
     AND B.business_id <> A.business_id 
     AND B.department_id <> A.department_id 
     AND B.datetime_local::date = A.datetime_local::date 
     AND B.datetime_local < A.datetime_local) AS cnt_samediffdiff_day0 
FROM fact_interactions A; 

我總共有180子查詢的我試圖計算計數。因此,如果fact_interaction有1,000,000條記錄,則輸出也會有1,000,000條記錄,但會有inter_id加180個計數列。這裏有那麼那些180級計數的子查詢將被命名爲給一些進一步的解釋一些例子:

  • cnt_samesamesame_day0 /第3天/第7天/ ...
  • cnt_samesamediff_day0 /第3天/第7天/ ...
  • cnt_samediffdiff_day0 /第3天/第7天/ ...

查詢能夠整理的,但你可以想像它需要長的時間。只是計算需要一分鐘。

很難包含輸出結果的樣本,因爲它非常稀疏。

有關如何更有效地做到這一點的任何建議?具體的例子非常感謝,但即使是更好的一般方法也會令人驚歎。謝謝!

(我試圖實現這個亞馬遜紅移集羣上)

回答

1

我可能會建議您瞭解窗口的功能。例如:

SELECT inter_id, 
     COUNT(*) OVER (PARTITION BY customer_id, business_id, department_id, department_id, datetime_local 
         ORDER BY datetime_local 
        ) as cnt_samesamesame_day0, 
     . . . 

對於其他列可能有類似的結構。