我有一個用於我們的IOT指標(時間序列數據)的集羣列存儲索引表。它包含超過10億行,結構如下:如何優化SQL Server列存儲對齊
CREATE TABLE [dbo].[Data](
[DeviceId] [bigint] NOT NULL,
[MetricId] [smallint] NOT NULL,
[TimeStamp] [datetime2](2) NOT NULL,
[Value] [real] NOT NULL
)
CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)
從2008年到現在,有大約10,000個不同的DeviceId值和TimeStamps範圍。針對此表中典型的查詢看起來是這樣的:
SET STATISTICS TIME, IO ON
SELECT
[DeviceId]
,[MetricId]
,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date]
,MIN([Value]) as [Min]
,MAX([Value]) as [Max]
,AVG([Value]) as [Avg]
,SUM([Value]) as [Sum]
,COUNT([Value]) as [Count]
FROM
[dbo].[Data]
WHERE
[DeviceId] = 6077129891325167032
AND [MetricId] = 1000
AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30'
GROUP BY
[DeviceId]
,[MetricId]
,DATEDIFF(day, '2005-01-01', [TimeStamp])
ORDER BY
[DeviceId]
,[MetricId]
,DATEDIFF(day, '2005-01-01', [TimeStamp])
當我執行此查詢,我得到這樣的性能指標:
因爲此刻像上述查詢做太多的段讀我相信:
Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000.
Table 'Data'. Segment reads 11, segment skipped 764.
這不是很好的優化,我相信,因爲w ^在分組/集合之前,只讀取11個分段中的212個分段
然後我運行Niko Neugebauer的優秀腳本來驗證我們的設置和Columnstore對齊https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql,重建Columnstore後我得到了這個結果聚集索引:
MetricId和時間戳列有100%的最佳比對得分。我們如何確保DeviceId列也很好地對齊?我在初始的Clustered(Rowstore)索引中使用了列順序,是否可以優化事物?
請貼查詢計劃,XML以及 – TheGameiswar
如果您正在使用SQL Server 2016年,嘗試使用DBCC clonedb和共享數據庫,以便其他人可以瑞普你facing.if你不使用2016年確切的情況,你可以腳本我們的表架構,索引,統計數據和嘗試共享腳本 – TheGameiswar
@TheGameiswar [DBCC CLONEDATABASE](HTTPS ://support.microsoft.com/en-gb/help/3177838/how-to-use-dbcc-clonedatabase-to-generate-a-schema-and-statistics-only)可從SQL Server 2014 SP2起: ) – wBob