2014-02-07 31 views
0

我一直在這個圈子裏繞了一段時間,希望有人能幫助我。SQL - 計數行的性能調整

我有以下表格:

案件

generic_case_id

case_subtype

Case_Countries

generic_case_id

COUNTRY_CODE

論文

generic_case_id

paper_name

Archived_List

paper_name

現在,我我試圖得到一個未歸檔的獨特紙張名稱列表。然後爲每個紙張清單列出與之相關的案例數量。

SELECT paper_name, case_count 
FROM (
    SELECT paper_name, count (1) case_count, row_number() over (order by paper_name DESC) rn, count(*) over() count_rec 
    FROM (
      SELECT distinct(paper_name), generic_case_id 
      FROM papers a, cases b, case_countries c 
      WHERE 
      NOT EXISTS (select paper_name FROM archived_list d WHERE a.paper_name = d.paper_name) 
      AND a.generic_case_id = b.generic_case_id 
      AND b.generic_case_id = c.generic_case_id 
      AND c.country_code = '15618' 
      AND b.case_subtype IN (50022,50023) 
    ) GROUP BY paper_name 
) 
WHERE rn BETWEEN 1 AND 15; 

這似乎工作,雖然它需要很長時間才能完成。任何人都可以提出更清潔的方法?

感謝 伊恩

+2

爲什麼三重嵌套?外部的一個可以肯定地被移除。 –

+2

請提供表格的CREATE和INSERT語句或使用http://sqlfiddle.com/。 – anna

+0

對不起 - 我也分頁結果,並忘記刪除外row_number的東西。將編輯 – user2294382

回答

3

我覺得這是等價的:

SELECT a.paper_name, COUNT(DISTINCT a.generic_case_id) AS case_count 
FROM papers a 
    JOIN cases b ON a.generic_case_id = b.generic_case_id 
    JOIN case_countries c ON b.generic_case_id = c.generic_case_id 
WHERE 
    NOT EXISTS (SELECT 1 FROM archived_list d WHERE a.paper_name = d.paper_name) 
    AND c.country_code = '15618' 
    AND b.case_subtype IN (50022,50023) 
GROUP BY a.paper_name ; 

如果papers (paper_name, generic_case_id)是獨一無二的,那麼它也相當於用:

SELECT a.paper_name, COUNT(*) AS case_count 
FROM papers a 
WHERE 
    NOT EXISTS (SELECT 1 FROM archived_list d WHERE a.paper_name = d.paper_name) 
    AND EXISTS (SELECT 1 FROM case_countries c 
       WHERE b.generic_case_id = c.generic_case_id 
       AND c.country_code = '15618' 
      ) 
    AND EXISTS (SELECT 1 FROM cases b 
       WHERE a.generic_case_id = b.generic_case_id 
       AND b.case_subtype IN (50022,50023) 
      ) 
GROUP BY a.paper_name ; 
2

可以簡化結構,並刪除count_rec因爲你沒有使用它。這裏是一個查詢的修改形式:

 SELECT paper_name, count(distinct generic_case_id) as case_count   
     FROM papers a join 
      cases b 
      on a.generic_case_id = b.generic_case_id join 
      case_countries c 
      on b.generic_case_id = c.generic_case_id 
     WHERE NOT EXISTS (select paper_name 
         FROM archived_list d 
         WHERE a.paper_name = d.paper_name 
         ) AND 
      c.country_code = '15618' 
      b.case_subtype IN (50022,50023) 
     GROUP BY paper_name; 

我的猜測是子查詢需要時間來執行。你有archived_list(paper_name)的索引嗎?我也懷疑count(distinct generic_case_id)可能只是count(*),但這符合您原始查詢的邏輯(如果案例可能在多個國家/地區,則distinct是必要的)。另外,如果country_code確實是一個整數,那麼您應該刪除該值的單引號。在某些情況下,類型差異會影響是否使用索引。

該子查詢可能很昂貴(其中archive的表名稱表示較大)。在where子句中,它將在聚合之前的每一行運行。它移動到having條款可能會有所幫助:

 SELECT paper_name, count(distinct generic_case_id) as case_count   
     FROM papers a join 
      cases b 
      on a.generic_case_id = b.generic_case_id join 
      case_countries c 
      on b.generic_case_id = c.generic_case_id 
     WHERE c.country_code = '15618' 
      b.case_subtype IN (50022,50023) 
     GROUP BY paper_name 
     HAVING NOT EXISTS (select paper_name 
         FROM archived_list d 
         WHERE a.paper_name = d.paper_name 
         ) 

最後,切換到外部連接可能會提高性能:

select pc.paper_name, pc.case_count 
FROM (SELECT paper_name, count(distinct generic_case_id) as case_count   
     FROM papers a join 
      cases b 
      on a.generic_case_id = b.generic_case_id join 
      case_countries c 
      on b.generic_case_id = c.generic_case_id 
     WHERE c.country_code = '15618' 
      b.case_subtype IN (50022,50023) 
     GROUP BY paper_name 
    ) pc left outer join 
    archived_list al 
    on pc.paper_name = al.paper_name 
where al.paper_name is null; 
0

查詢:

SELECT a.paper_name, COUNT(DISTINCT a.generic_case_id) AS case_count 
FROM papers a 
    JOIN cases b ON a.generic_case_id = b.generic_case_id 
    JOIN case_countries c ON b.generic_case_id = c.generic_case_id 
    JOIN archived_list d ON a.paper_name = d.paper_name 
WHERE d.paper_name is null 
    AND c.country_code = '15618' 
    AND b.case_subtype IN (50022,50023) 
GROUP BY a.paper_name