如何平鋪跨越邊界均勻分佈的數據

我有一組基於時間的記錄集，需要將其加載到用於分段數據的分區表中。舞臺表按天劃分。爲了提高效率，我將數據負載分佈到多個「處理器」（SSIS中的流）的階段表中。一旦數據被分級，我就會在加載到數據集市之前執行一系列的重複數據刪除操作。然而，我的挑戰是，分階段的數據並不是均勻地分佈在處理器中，因爲我在由日期分區的集合中使用NTILE函數。如何平鋪跨越邊界均勻分佈的數據

所以我可能會看到下面的分配5個處理器...

Processor 1, >= 2011-01-01 and < 2011-05-01, Rows = 200,000 
Processor 2, >= 2011-05-01 and < 2011-09-01, Rows = 3,000,000 
Processor 3, >= 2011-09-01 and < 2012-01-01, Rows = 6,000,000 
Processor 4, >= 2012-01-01 and < 2012-05-01, Rows = 6,000,000 
Processor 5, >= 2012-05-01 and < 2012-09-01, Rows = 0

的數據量成倍增加，所以雖然處理器4只擁有600萬在負載時的今天，一旦滿範圍被填充，處理器4可能總共工作在8,000,000+行（記錄）。

我的目標是根據行數平均分配處理器上的工作量，同時確保任何兩個處理器不會爭用相同的分區（日）。

所以，作爲一個視覺的分佈將需要是這個樣子......

Processor 1, >= 2011-01-01 and < 2011-09-01, Rows (3,200,000) 
Processor 2, >= 2011-09-01 and < 2011-11-01, Rows (3,000,000) 
Processor 3, >= 2011-11-01 and < 2012-01-01, Rows (3,000,000) 
Processor 4, >= 2012-01-01 and < 2012-01-03, Rows (3,000,000) 
Processor 5, >= 2012-01-03 and < 2012-03-18, Rows (3,000,000; 2012-03-18 contains most current data)

任何反饋將不勝感激。

來源

2012-03-18 Sean Fitzgerald