我試圖設置一些數據來計算SQL Server 2008中的多箇中值,但我遇到了性能問題。現在,我使用pattern([另一個示例爲bottom)。是的,我沒有使用CTE,但使用一個不會解決我反正的問題,性能很差,因爲row_number子查詢以串行方式運行,而不是並行運行。在單個SQL查詢中調用多個Row_Number()函數
下面是一個完整的例子。在SQL下面,我更多地解釋這個問題。
-- build the example table
CREATE TABLE #TestMedian (
StateID INT,
TimeDimID INT,
ConstructionStatusID INT,
PopulationSize BIGINT,
SquareMiles BIGINT
);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 200000, 300000);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 300000, 400000);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 250000, 300000);
INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 350000, 400000);
--TruNCATE TABLE TestMedian
SELECT
StateID
,TimeDimID
,ConstructionStatusID
,NumberOfRows = COUNT(*) OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID)
,PopulationSizeRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY PopulationSize)
,SquareMilesRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY SquareMiles)
,PopulationSize
,SquareMiles
INTO #MedianData
FROM #TestMedian
SELECT MinRowNum = MIN(PopulationSizeRowNum), MaxRowNum = MAX(PopulationSizeRowNum), StateID, TimeDimID, ConstructionStatusID, MedianPopulationSize= AVG(PopulationSize)
FROM #MedianData T
WHERE PopulationSizeRowNum IN((NumberOfRows + 1)/2, (NumberOfRows + 2)/2)
GROUP BY StateID, TimeDimID, ConstructionStatusID
SELECT MinRowNum = MIN(SquareMilesRowNum), MaxRowNum = MAX(SquareMilesRowNum), StateID, TimeDimID, ConstructionStatusID, MedianSquareMiles= AVG(SquareMiles)
FROM #MedianData T
WHERE SquareMilesRowNum IN((NumberOfRows + 1)/2, (NumberOfRows + 2)/2)
GROUP BY StateID, TimeDimID, ConstructionStatusID
DROP TABLE #MedianData
DROP TABLE #TestMedian
與此查詢的問題是,SQL Server執行雙方的「ROW__NUMBER()OVER ......」子查詢串行,不能並行。所以如果我有這些ROW__NUMBER計算中的10個,它會一個接一個地計算出它們,並且我得到線性增長,這很糟糕。我有一個8路32GB系統,我正在運行這個查詢,我會喜歡一些並行性。我試圖在5,000,000行的表上運行這種類型的查詢。
我可以通過查看查詢計劃並在相同的執行路徑中查看排序(顯示查詢計劃的XML在SO上無法正常工作)來告訴其執行此操作。
所以我的問題是這樣的:我如何改變這個查詢,以便ROW_NUMBER查詢是並行執行的?是否有一種完全不同的技術可用於準備多箇中位數計算的數據?
+1,足夠的代碼試試我的系統上! – 2009-09-04 17:19:28
+1,因爲我不知道你可以在排名函數之外使用OVER子句 - 在SQL 2005中也是如此。活泉! – 2009-09-04 18:13:03
Philip:對於普通的集合函數,儘管只有PARTITION BY子句,而不是ORDER BY部分:-( – RBarryYoung 2009-09-04 19:30:42