SSIS只添加已更改的行

我有一個項目，其中包括將所有用戶（包括其所有屬性）從Active Directory域導入到SQL Server表。此表將由Reporting Services應用程序使用。SSIS只添加已更改的行

表模型具有以下的列： -ID：（即自動生成的唯一標識符）。 -distinguishedName：包含用戶的LDAP專有名稱屬性。 -attribute_name：包含用戶屬性的名稱。 -attribute_value：包含屬性值。 -timestamp：包含自動生成的日期時間值。

我已經創建了一個腳本任務的SSIS包，其中包含一個C＃代碼，可將所有數據導出到稍後由數據流任務導入到表中的.CSV。該項目沒有任何問題，但生成了超過200萬行（AD域有大約30,000個用戶，每個用戶有100-200個屬性）。

SSIS包應該每天運行，並且只有當新的用戶屬性或屬性值更改時才導入數據。

爲了做到這一點，我創建了一個數據流，將整個表複製到一個記錄集中。

此記錄被轉換成一個數據表，並在腳本組件步驟，如果在所述數據表中存在的當前行，其verfies使用。如果該行存在，則比較屬性值，並僅當值不同時或在數據表中找不到該行時纔將行返回給輸出。這是代碼：

塊引用

public override void Input0_ProcessInputRow(Input0Buffer Row) 
{ 
    bool processRow = compareValues(Row); 

    if (processRow) 
    { 
     //Direct to output 0 
     Row.OutdistinguishedName = Row.distinguishedName.ToString(); 
     Row.Outattributename = Row.AttributeName.ToString(); 
     Row.Outattributevalue.AddBlobData(System.Text.Encoding.UTF8.GetBytes(Row.AttributeValue.ToString())); 
    } 
} 

public bool compareValues(Input0Buffer Row) 
{ 
    //Variable declaration 
    DataTable dtHostsTbl = (DataTable)Variables.dataTableTbl; 
    string expression = "", distinguishedName = Row.distinguishedName.ToString(), attribute_name = Row.AttributeName.ToString(), attribute_value = Row.AttributeValue.ToString(); 
    DataRow[] foundRowsHost = null; 

    //Query datatable 
    expression = "distinguishedName LIKE '" + distinguishedName + "' AND attribute_name LIKE '" + attribute_name + "'"; 
    foundRowsHost = dtHostsTbl.Select(expression); 

    //Process found row 
    if (foundRowsHost.Length > 0) 
    { 
     //Get the host id 
     if (!foundRowsHost[0][2].ToString().Equals(attribute_value)) 
     { 
      return true; 
     } 
     else 
     { 
      return false; 
     } 
    } 
    else 
    { 
     return true; 
    } 
}

的代碼工作，但它是極其緩慢。有沒有更好的方法來做到這一點？

來源

2015-11-20 Sergio

這裏有一些想法：

選項A. （實際上是一個期權組合）

使用whenChanged屬性查詢Active Directory時消除不必要的數據。僅此一項就會顯着減少記錄數量。如果通過whenChanged進行篩選是不可能的，或者除此之外，請考慮以下步驟。
而不是將所有現有記錄導入Recordset Destination - 將它們導入Cache Transform。然後在2 Lookup組件的緩存連接管理器中使用此緩存轉換。一個查找組件驗證{distinguishedName,attribute_name}組合是否存在。（這會插入）另一個查找組件驗證{distinguishedName,attribute_name,attribute_value}組合是否存在（這將是更新或刪除/插入）。這對查找應替代您的Skip rows which are in the table腳本組件。
評估是否可以減小色譜柱尺寸：attribute_name和attribute_value。特別是nvarchar(max)經常破壞派對。
如果無法縮小attribute_name和attribute_value的大小 - 請考慮存儲它們的散列值並驗證散列值是否發生了變化，而不是驗證值本身。
刪除CSV步驟 - 僅將數據從當前填充CSV的初始源傳輸到一個數據流中的查找以及查找中找不到的數據 - 傳輸到您的OLE DB Destination組件。

選項B.

檢查源，其從Active Directory中讀取，速度快本身。（只需單獨運行數據流，沒有任何目的地來衡量其性能）。如果您對其性能表示滿意，並且如果您不反對刪除ad_User表中的所有內容 - 只需每天刪除並重新填充這兩百萬個表。從AD讀取所有內容並將其寫入SQL Server中，在同一數據流中，沒有任何更改檢測，實際上可能是最簡單和最快的選項。

來源

2015-11-21 03:19:49 helix

感謝您的建議，螺旋。我發現了一個更簡單的方法來做到這一點，我剛剛導入新的AD出口到另一個表和使用EXCEPT命令： SELECT的distinguishedName，屬性名稱，ATTRIBUTE_VALUE FROM dbo.ad_User EXCEPT SELECT的distinguishedName，屬性名稱，ATTRIBUTE_VALUE FROM dbo.ad_User_Old 該命令只需要10秒。 – Sergio

SSIS只添加已更改的行

回答

相關問題