自定義對象的大集合Intersection（）和Except（）速度太慢

我正在從另一個數據庫導入數據。自定義對象的大集合Intersection（）和Except（）速度太慢

我的過程是進口從遠程數據庫的數據爲List<DataModel>名爲remoteData和進口也從本地DB數據到一個名爲localDataList<DataModel>。

然後，我使用LINQ創建不同的記錄列表，以便我可以更新本地數據庫以匹配從遠程數據庫中提取的數據。就像這樣：

var outdatedData = this.localData.Intersect(this.remoteData, new OutdatedDataComparer()).ToList();

我然後使用LINQ創建一個記錄列表，在remoteData不復存在，但在localData確實存在，使我從本地數據庫中刪除。

像這樣：

var oldData = this.localData.Except(this.remoteData, new MatchingDataComparer()).ToList();

我然後使用LINQ做上述相反的新數據添加到本地數據庫。

像這樣：

var newData = this.remoteData.Except(this.localData, new MatchingDataComparer()).ToList();

每個收集進口約70K的記錄，每5之間的3 LINQ操作起跳 - 10分鐘完成。 如何讓這個更快？

這裏是集合使用對象：

internal class DataModel 
{ 
     public string Key1{ get; set; } 
     public string Key2{ get; set; } 

     public string Value1{ get; set; } 
     public string Value2{ get; set; } 
     public byte? Value3{ get; set; } 
}

用於檢查過時的記錄的比較器：

class OutdatedDataComparer : IEqualityComparer<DataModel> 
{ 
    public bool Equals(DataModel x, DataModel y) 
    { 
     var e = 
      string.Equals(x.Key1, y.Key1) && 
      string.Equals(x.Key2, y.Key2) && (
       !string.Equals(x.Value1, y.Value1) || 
       !string.Equals(x.Value2, y.Value2) || 
       x.Value3 != y.Value3 
       ); 
     return e; 
    } 

    public int GetHashCode(DataModel obj) 
    { 
     return 0; 
    } 
}

用來尋找新老記錄的比較器：

internal class MatchingDataComparer : IEqualityComparer<DataModel> 
{ 
    public bool Equals(DataModel x, DataModel y) 
    { 
     return string.Equals(x.Key1, y.Key1) && string.Equals(x.Key2, y.Key2); 
    } 

    public int GetHashCode(DataModel obj) 
    { 
     return 0; 
    } 
}

來源

2013-11-01 Theo

+11

你應該真的* *實現的哈希碼。 –

哈希碼用於查找哈希表中的對象，這可能是Except和Intersect在內部用來查找匹配對象的原因。通過返回一個常數值，所有對象將具有相同的位置，並且匹配搜索將降級爲所有候選中的線性搜索。您需要基於用於相等的屬性正確實現'GetHashCode'。 – Lee

正確。我添加了一個哈希碼，並且該操作需要一秒鐘！謝謝。 – Theo

通過擁有恆定的哈希碼，您已經破壞了性能。這裏是內部代碼相交用途（通過反編譯得到）

public static IEnumerable<TSource> Intersect<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) 
{ 
    if (first == null) 
    { 
     throw Error.ArgumentNull("first"); 
    } 
    if (second == null) 
    { 
     throw Error.ArgumentNull("second"); 
    } 
    return Enumerable.IntersectIterator<TSource>(first, second, comparer); 
} 

private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) 
{ 
    Set<TSource> set = new Set<TSource>(comparer); 
    foreach (TSource current in second) 
    { 
     set.Add(current); 
    } 
    foreach (TSource current2 in first) 
    { 
     if (set.Remove(current2)) 
     { 
      yield return current2; 
     } 
    } 
    yield break; 
}

看到它是用Set內，如果實現的哈希碼將大大提高它的性能。

MatchingDataCompaer是兩個更容易，所以我會爲你做一個。

internal class MatchingDataComparer : IEqualityComparer<DataModel> 
{ 
    public MatchingDataComparer() 
    { 
     comparer = StringComparer.Ordnal; //Use whatever comparer you want. 
    } 

    private readonly StringComparer comparer; 

    public bool Equals(DataModel x, DataModel y) 
    { 
     return comparer.Equals(x.Key1, y.Key1) && comparer.Equals(x.Key2, y.Key2); 
    } 

    //Based off of the advice from http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode 
    public int GetHashCode(DataModel obj) 
    {  
     unchecked // Overflow is fine, just wrap 
     { 
      int hash = 17; 
      hash = hash * 23 + comparer.GetHashCode(obj.Key1); 
      hash = hash * 23 + comparer.GetHashCode(obj.Key2); 
      return hash; 
     } 
    } 
}

你可能會使用來自MatchingDataComparer的哈希碼功能OutdatedDataComparer，它可能不是「optimial」散列碼，但它會是一個「合法」的之一，並會比硬編碼0

^{1.或者，它可能會快很多，我不知道我怎麼會包括3 &&條件
2.如果a.Equals(b) == true然後a.GetHashCode() == b.GetHashCode()。
然後如果a.Equals(b) == falsea.GetHashCode() == b.GetHashCode() || a.GetHashCode() != b.GetHashCode()}

來源

2013-11-01 21:41:03

謝謝你的詳細解釋斯科特！我實現了散列碼，操作速度很快。 – Theo

自定義對象的大集合Intersection（）和Except（）速度太慢

回答

相關問題