2012-12-22 78 views
0

我有兩個表,T1和T2如下SELECT COUNT不同,共同行:從兩個表

CATEGORY  ID 
1   1100 
1   1200 
1   1300 
1   1500 
2   2000 
2   2100 
2   2300 
2   2500 

我需要知道:

  • 多少行是T1和T2之間的相似(相同的類別和1D)
  • 從T2多少行不在T1
  • 從T1多少行不在T2

我羣聚我的頭就可以了,因爲今天早上,並試圖做到這一點,以獲得類似的行:

select count(*) from T1, T2 WHERE 
T1.CATEGORY = T2.CATEGORY AND T1.ID = T2.ID; 

但我無法弄清楚如何獲得唯一行(僅在T1或T2)。

回答

5

問題1

SELECT COUNT(*) totalCount 
FROM T1 a 
     INNER JOIN T2 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 

問題2(使用LEFT JOIN

SELECT COUNT(*) totalCount 
FROM T2 a 
     LEFT JOIN T1 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 
WHERE b.Category IS NULL 

問題3(使用LEFT JOIN

SELECT COUNT(*) totalCount 
FROM T1 a 
     LEFT JOIN T2 b 
      ON a.Category = b.Category AND 
       a.ID = b.ID 
WHERE b.Category IS NULL 
+0

專家給解答快速:-) – vels4j

0

如果你不能爲總結行是不同的,那麼你需要採取一種稍微不同的方法。下面是回答在同一時間的所有三個問題,考慮到重複行的方法:

select (case when isT1 = 1 and isT2 = 0 then 'BOTH' 
      when isT1 = 1 then 'T1-Only' 
      else 'T2-Only' 
     end) as WhereRow, 
     count(*) as NumDistinctRows, 
     sum(cnt) as NumTotalRows 
from ((select category, id, count(*) as cnt, 1 as isT1, 0 as isT2 
     from t1 
     group by category, id 
    ) union all 
     (select category, id, count(*) as cnt, 0 as isT1, 1 as isT2 
     from t2 
     group by category, id 
    ) 
    ) t 
group by isT1, isT2 
1
DROP SCHEMA tmp CASCADE; 
CREATE SCHEMA tmp ; 
SET search_path=tmp; 

CREATE TABLE lutser 
     (id INTEGER NOT NULL 
     , category INTEGER NOT NULL 
     ); 
INSERT INTO lutser(category, id) VALUES 
(1,1100) ,(1,1200) ,(1,1300) ,(1,1500) 
,(2,2000) ,(2,2100) ,(2,2300) ,(2,2500) 
,(1,3500) -- added these 
,(2,3500) 
     ; 

這些查詢構建一個「位掩碼」 1類== 1,2類== 2,並添加它們。因此,當兩個集合中都存在id時,掩碼爲3,僅在第一個集合中爲1,而僅在第二集合中爲2。外部連接+聚合在這裏做的伎倆。

 -- 
     -- CTE version 
     -- 
WITH flags AS (
     WITH one AS (SELECT category AS flag , id FROM lutser WHERE category = 1) 
     , two AS (SELECT category AS flag , id FROM lutser WHERE category = 2) 
     SELECT COALESCE(one.flag, 0) + COALESCE(two.flag, 0) AS flag 
     FROM one 
     FULL OUTER JOIN two ON two.id = one.id 
     ) 
SELECT flag, COUNT(*) 
FROM flags 
GROUP BY flag; 

     -- 
     -- Non-CTE version 
     -- 
SELECT COALESCE(one.flag, 0) + COALESCE(two.flag, 0) AS flags 
     , COUNT(*) 
FROM (
     SELECT category AS flag , id 
     FROM lutser WHERE category = 1 
     ) one 
FULL OUTER JOIN (
     SELECT category AS flag , id 
     FROM lutser WHERE category = 2 
     ) two ON two.id = one.id 
GROUP BY flags; 

結果(這兩個查詢;-):

flags | count 
-------+------- 
    1 |  4 
    2 |  4 
    3 |  1 
+0

我認爲'不支持MySQL的FULL JOIN'。 –

+0

感謝您的回答,這真的很好。但是我們擁有1000萬個原始數據,而且它確實耗費了內存。 – madkitty

+0

這是在一個查詢中回答您的三個問題的唯一方法。 10M行在這裏不相關; *每個*解決方案都會受益於id上的某種索引。 – wildplasser