2012-06-07 109 views
0

我有一個非常大型的網絡論壇應用程序(自2001年以來約有2000萬個帖子)從SQL Server 2012數據庫運行。數據文件大小約爲40GB。SQL Server - 嵌套查詢需要40分鐘才能運行

我添加索引爲相應字段的表,但是這個查詢(揭示帖子的日期範圍,每個論壇)大約40分鐘運行:

SELECT 
    T2.ForumId, 
    Forums.Title, 
    T2.ForumThreads, 
    T2.ForumPosts, 
    T2.ForumStart, 
    T2.ForumStop 

FROM 
    Forums 
    INNER JOIN (

    SELECT 
     Min(ThreadStart) As ForumStart, 
     Max(ThreadStop) As ForumStop, 
     Count(*) As ForumThreads, 
     Sum(ThreadPosts) As ForumPosts, 
     Threads.ForumId 
    FROM 
     Threads 
     INNER JOIN (

      SELECT 
       Min(Posts.DateTime) As ThreadStart, 
       Max(Posts.DateTime) As ThreadStop, 
       Count(*) As ThreadPosts, 
       Posts.ThreadId 
      FROM 
       Posts 
      GROUP BY 
       Posts.ThreadId 

     ) As P2 ON Threads.ThreadId = P2.ThreadId 

    GROUP BY 
     Threads.ForumId 

) AS T2 ON T2.ForumId = Forums.ForumId 

我怎麼能加快步伐?

UPDATE:

這是估計的執行計劃,由右至左:

[Path 1] 

Clustered Index Scan (Clustered) [Posts].[PK_Posts], Cost: 98% 
Hash Match (Partial Aggregate), Cost: 2% 
Parallelism (Repartition Streams), Cost: 0% 
Hash Match (Aggregate), Cost 0% 
Compute Scalar, Cost: 0% 
Bitmap (Bitmap Create), Cost: 0% 

[Path 2] 

Index Scan (NonClustered) [Threads].[IX_ForumId], Cost: 0% 
Parallelism (Repartition Streams), Cost: 0% 

[Path 1 and 2 converge into Path 3] 

Hash Match (Inner Join), Cost: 0% 
Hash Match (Partial Agregate), Cost: 0% 
Parallelism (Repartition Streams), Cost: 0% 
Sort, Cost: 0% 
Stream Aggregate (Aggregate), Cost: 0% 
Compute Scalar, Cost: 0% 

[Path 4] 
Clustered Index Seek (Clustered) [Forums].[PK_Forums], Cost: 0% 

[Path 3 and 4 converge into Path 5] 

Nested Loops (Inner Join), Cost: 0% 
Paralleism (Gather Streams), Cost: 0% 
SELECT, Cost: 0% 
+0

查詢的執行計劃是什麼樣的? – Taryn

+0

40Gig?並不罕見..添加索引! – mschr

+0

使這些「掃描」成爲「尋找」,它會更好 - 通過添加,更改索引。你可能想把表分成分區。 – SQLMason

回答

0

我在數據庫中添加了一些索引,它大大提高了速度。執行時間現在約爲20秒(!!)。我承認很多增加的索引是猜測(或者只是隨機添加)。

1

你有沒有試圖把那些2派生表在#temp表? SQL Server將從它們獲得統計信息(單列),並且您還可以爲它們創建索引。

此外,初看索引視圖可以幫助你,因爲你有很多聚合。

0

你真的需要聚合兩次嗎?這個查詢會給你相同的結果嗎?

SELECT 
T2.ForumId, 
Forums.Title, 
T2.ForumThreads, 
T2.ForumPosts, 
T2.ForumStart, 
T2.ForumStop 
FROM 
    Forums 
INNER JOIN ( 
    SELECT 
     Min(ThreadStart) As ForumStart, 
     Max(ThreadStop) As ForumStop,  
     Count(*) As ForumThreads,  
     Sum(ThreadPosts) As ForumPosts,  
     Threads.ForumId 
    FROM  
     Threads  
    INNER JOIN (   
       SELECT    
        Posts.DateTime As ThreadStart,    
        Posts.DateTime As ThreadStop,    
        Count(*) As ThreadPosts,    
        Posts.ThreadId   
       FROM    
        Posts   
       ) As P2 ON Threads.ThreadId = P2.ThreadId 
    GROUP BY  
     Threads.ForumId 
    ) AS T2 ON T2.ForumId = Forums.ForumId 
1

這樣的事情呢?無論如何,你的想法......當你做SELECT FROM

SELECT f.ForumID, 
f.Title, 
MIN(p.[DateTime]) as ForumStart, 
MAX(p.[DateTime]) as ForumStop, 
COUNT(DISTINCT f.ForumID) as ForumPosts, 
COUNT(DISTINCT t.ThreadID) as ForumThreads 
FROM Forums f 
INNER JOIN Threads t 
ON f.ForumID = t.ForumID 
INNER JOIN Posts p 
ON p.ThreadID = p.ThreadID 
GROUP BY f.ForumID, f.Title 
+1

+1,我發佈了(當我看到你的時候刪除)幾乎相同的解決方案。但是,不應該把'f.forumid'變成'*'嗎?這是每個論壇ID的帖子數量。 –

1

索引可以工作,但子查詢的結果不被索引。加入他們可能會導致表演失敗。

正如Buckley建議的那樣,我會嘗試將中間結果存儲在#temp表中,並在執行最終查詢之前添加索引。

但外部的SELECT不包括線程特定的信息。它看起來像查詢只是按論壇選擇最小/最大日期。如果是這樣,您可以獲取按論壇分組的最小/最大/計數帖子。

0

如果通過將ForumId添加到Posts表中來進行非規範化,您將能夠直接從Posts表中查詢所有統計信息。有了正確的索引,這可能表現相當不錯。當然,這將需要對您的代碼進行小的更改,以便在插入到Posts表中時包含ForumId ...

相關問題