我的解決方案首先生成所有感興趣的應用程序對。這是driver
子查詢。
然後它加入每個應用程序的原始數據。
最後,它使用count(distinct)
來計算兩個列表之間匹配的不同用戶。
select pairs.app1, pairs.app2,
COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2
group by pairs.app1, pairs.app2
到加入您可以移動在count
的條件比較,並只使用count(distinct)
:
select pairs.app1, pairs.app2,
COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2 and
tright.user = tleft.user
group by pairs.app1, pairs.app2
我更喜歡第一種方法,因爲它是被算什麼更加明確。
這是標準的SQL,所以它應該在Vertica上工作。
大戈登....我會嘗試,讓你知道。謝謝 – user1570210