2015-06-16 56 views
0

我有一個表有兩列感興趣的列item_idbucket_idbucket_id有固定數量的值,如果需要,我可以列出它們。計算組之間的重疊

每個item_id可以出現多次,但每次出現時將有一個單獨的bucket_id值。例如,123item_id可以在表格中出現兩次,在Abucket_id下出現一次,在B出現一次。

我的目標是確定有多少重疊的每一對bucket_id值之間存在並將其顯示爲一個N×N的矩陣。

例如,請考慮以下的小示例表:

item_id  bucket_id 
========= =========== 
111   A 
111   B 
111   C 

222   B 
222   D 

333   A 
333   C 

444   C 

因此,對於這個數據集,水桶AB的共同點,水桶C一個item_idD沒有共同項目等

我想將上表格式化爲如下格式:

 A  B  C  D 
=================================== 
A  2  1  2  0 
B  1  2  1  1 
C  2  1  3  0 
D  0  1  0  1 

在上表中,行和列的交叉告訴你許多記錄都bucket_id值是如何存在的。例如,如果A行與C列相交,則我們有2,因爲在bucket_id A和C中都存在2條記錄。由於X和Y的交點與Y和X的交點相同,因此上述桌子在對角線上被鏡像。

我想象中的查詢涉及PIVOT,但我不能爲我的生活弄清楚如何得到它的工作。

+0

什麼是最後一個表格表示(例如行A列C的值爲2表示什麼?)? – John

+1

@John - 我在桌子下面增加了一些解釋。 –

+0

這是一種度量。但是可以安慰 – Ravi

回答

1

您可以使用簡單的PIVOT:

SELECT t1.bucket_id, 
     SUM(CASE WHEN t2.bucket_id = 'A' THEN 1 ELSE 0 END) AS A, 
     SUM(CASE WHEN t2.bucket_id = 'B' THEN 1 ELSE 0 END) AS B, 
     SUM(CASE WHEN t2.bucket_id = 'C' THEN 1 ELSE 0 END) AS C, 
     SUM(CASE WHEN t2.bucket_id = 'D' THEN 1 ELSE 0 END) AS D 
FROM table1 t1 
JOIN table1 t2 ON t1.item_id = t2.item_id 
GROUP BY t1.bucket_id 
ORDER BY 1 
; 

或者您可以使用Oracle PIVOT子句(適用於11.2及更高版本):

SELECT * FROM (
    SELECT t1.bucket_id AS Y_bid, 
      t2.bucket_id AS x_bid 
    FROM table1 t1 
    JOIN table1 t2 ON t1.item_id = t2.item_id 
) 
PIVOT (
    count(*) FOR x_bid in ('A','B','C','D') 
) 
ORDER BY 1 
; 

實例:http://sqlfiddle.com/#!4/39d30/7

0

我相信這應該得到你所需要的數據。然後可以以編程方式(或在Excel等中)旋轉表格。

-- This gets the distinct pairs of buckets 
select distinct 
    a.name, 
    b.name 
from 
    bucket a 
    join bucket b 
where 
    a.name < b.name 
order by 
    a.name, 
    b.name 

+ --------- + --------- + 
| name  | name  | 
+ --------- + --------- + 
| A   | B   | 
| A   | C   | 
| A   | D   | 
| B   | C   | 
| B   | D   | 
| C   | D   | 
+ --------- + --------- + 
6 rows 

-- This gets the distinct pairs of buckets with the counts you are looking for 
select distinct 
    a.name, 
    b.name, 
    count(distinct bi.item_id) 
from 
    bucket a 
    join bucket b 
    left outer join bucket_item ai on ai.bucket_name = a.name 
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id 
where 
    a.name < b.name 
group by 
    a.name, 
    b.name 
order by 
    a.name, 
    b.name 

+ --------- + --------- + ------------------------------- + 
| name  | name  | count(distinct bi.item_id)  | 
+ --------- + --------- + ------------------------------- + 
| A   | B   | 2        | 
| A   | C   | 1        | 
| A   | D   | 0        | 
| B   | C   | 2        | 
| B   | D   | 0        | 
| C   | D   | 0        | 
+ --------- + --------- + ------------------------------- + 
6 rows 

下面是與DDL整個示例,並插入設置它(這是MySQL,但同樣的想法在其他地方適用):

use example; 

drop table if exists bucket; 

drop table if exists item; 

drop table bucket_item; 

create table bucket (
    name varchar(1) 
); 

create table item(
    id int 
); 

create table bucket_item(
    bucket_name varchar(1) references bucket(name), 
    item_id int references item(id) 
); 

insert into bucket values ('A'); 
insert into bucket values ('B'); 
insert into bucket values ('C'); 
insert into bucket values ('D'); 

insert into item values (111); 
insert into item values (222); 
insert into item values (333); 
insert into item values (444); 
insert into item values (555); 

insert into bucket_item values ('A',111); 
insert into bucket_item values ('A',222); 
insert into bucket_item values ('A',333); 
insert into bucket_item values ('B',222); 
insert into bucket_item values ('B',333); 
insert into bucket_item values ('B',444); 
insert into bucket_item values ('C',333); 
insert into bucket_item values ('C',444); 
insert into bucket_item values ('D',555); 


-- query to get distinct pairs of buckets 
select distinct 
    a.name, 
    b.name 
from 
    bucket a 
    join bucket b 
where 
    a.name < b.name 
order by 
    a.name, 
    b.name 
; 

select distinct 
    a.name, 
    b.name, 
    count(distinct bi.item_id) 
from 
    bucket a 
    join bucket b 
    left outer join bucket_item ai on ai.bucket_name = a.name 
    left outer join bucket_item bi on bi.bucket_name = b.name and ai.item_id = bi.item_id 
where 
    a.name < b.name 
group by 
    a.name, 
    b.name 
order by 
    a.name, 
    b.name 
;