2017-04-19 68 views
2

我想要了解Azure SQL數據倉庫中的分區表。但我看到的東西對我來說沒有意義。我顯然做錯了什麼,但我無法弄清楚它是什麼。Azure SQL數據倉庫表中的分區數據

我的意圖是用10000行數據填充第一個表(Marc.foo),檢查分區元數據,然後將分區切換到第二個空表(Marc.foo2)。

我開始通過創建兩個分區表:

IF OBJECT_ID('Marc.foo', 'U') IS NOT NULL 
    DROP TABLE Marc.foo 
GO 

IF OBJECT_ID('Marc.foo2', 'U') IS NOT NULL 
    DROP TABLE Marc.foo2 
GO 

CREATE TABLE Marc.foo 
(
    id int NOT NULL 
) 
WITH 
( 
    DISTRIBUTION = HASH (id), 
    CLUSTERED COLUMNSTORE INDEX, 
    PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000)) 
) 
GO 

CREATE TABLE Marc.foo2 
(
    id int NOT NULL 
) 
WITH 
( 
    DISTRIBUTION = HASH (id), 
    CLUSTERED COLUMNSTORE INDEX, 
    PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000)) 
) 
GO 

我然後用10000行填充的第一個表(Marc.foo):

IF OBJECT_ID('tempdb..#numbers', 'U') IS NOT NULL 
    DROP TABLE #numbers 
GO 

WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b) 
SELECT  id 
INTO  #numbers 
FROM  CTE_64K 

INSERT INTO Marc.foo(id) 
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM #numbers WHERE id <=10000 

因爲我剛加載的數據到表,我打算在表格上創建統計數據:

CREATE STATISTICS stats_Marc_foo_id ON Marc.foo(id) WITH FULLSCAN 

現在我檢查分區威剛:

SELECT  sch.name AS [schema_name], 
      tbl.[name] AS [table_name], 
      ds.type_desc, 
      prt.[partition_number], 
      rng.[value] AS [current_partition_range_boundary_value], 
      prt.[rows] AS [partition_rows] 
FROM  sys.schemas        sch 
      INNER JOIN sys.tables     tbl ON sch.schema_id  = tbl.schema_id 
      INNER JOIN sys.partitions    prt ON prt.[object_id]  = tbl.[object_id] 
      INNER JOIN sys.indexes     idx ON prt.[object_id]  = idx.[object_id] AND prt.[index_id] = idx.[index_id] 
      INNER JOIN sys.data_spaces    ds ON idx.[data_space_id] = ds.[data_space_id] 
      INNER JOIN sys.partition_schemes  ps ON ds.[data_space_id] = ps.[data_space_id] 
      INNER JOIN sys.partition_functions  pf ON ps.[function_id] = pf.[function_id] 
      LEFT JOIN sys.partition_range_values rng ON pf.[function_id] = rng.[function_id] AND rng.[boundary_id] = prt.[partition_number] 
WHERE  sch.name = 'Marc' AND 
      tbl.name = 'foo' 

問題1:這給了我什麼,我期待在current_partition_range_boundary_value方面,但partition_rows(我希望是1000)返回5957行的每個分區。

最後,我嘗試從Marc.foo SWITCH分區1至Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1 

我希望,當我從Marc.foo2選擇,我應該可以看到1000行與ID值從1到1000但我回到零行。

問題2:我做錯了什麼?

回答

3

代碼中有錯誤。你的CTE帶回所有行的數字1,你可以通過檢查#numbers表的內容來確認。所以,你的id <= 10000標準沒有任何影響和語句總是帶回65,536行:通過移動ROW_NUMBER成的SELECT ... INTO

1 1 1 1 1

解決這個問題,比如

WITH 
    CTE_2 AS (SELECT 1 as id UNION ALL SELECT 1 as id), 
    CTE_4 AS (SELECT a.id FROM CTE_2 a, CTE_2 b), 
    CTE_16 AS (SELECT a.id FROM CTE_4 a, CTE_4 b), 
    CTE_256 AS (SELECT a.id FROM CTE_16 a, CTE_16 b), 
    CTE_64K AS (SELECT a.id FROM CTE_256 a, CTE_256 b) 
SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id 
INTO  #numbers 
FROM  CTE_64K 

我猜的道德故事是,不要編寫自己的數字生成例程而不檢查它:)

3

把數字表放在一邊,這裏是問題

問題1:這給了我對current_partition_range_boundary_value的期望,但partition_rows(我希望爲1000)爲每個分區返回5957行。

我仍然無法得到我期待的答案。

最後,我嘗試將開關分區1從Marc.foo切換到Marc.foo2

ALTER TABLE Marc.foo SWITCH PARTITION 1 to Marc.foo2 PARTITION 1 

我希望,當我從Marc.foo2選擇,我應該可以看到1000行與ID值從1到1000,但我回來零行。

問題2:我做錯了什麼?

我誤解了RANGE RIGHT。如果我們看一下CREATE TABLE的分區子句,我們看到:

PARTITION (id RANGE RIGHT FOR VALUES (0, 1000, 2000, 3000, 4000, 5000, 
6000, 7000, 8000, 9000))) 

這意味着,一個ID直到但不包括零將在分區1行,0和999之間的ID行會在分區2中。

分區1中沒有行。這是按設計工作的。如果我切換分區2,則行將出現在Marc.foo2中。