在數據表中過濾有更快的方法嗎？

我有一個包含2.5M行的數據表。我想過濾數據表中的一些行。 DataTable的在數據表中過濾有更快的方法嗎？

列：

[IntCode] long 
[BDIntCode] long 
[TxnDT] DateTime 
[TxnQuantity] decimal 
[RecordUser] long 
[RecordDT] DateTime

我的代碼是象下面這樣：

  foreach (var down in breakDowns) 
      { 
       sw.Start(); 
       var relatedBreakDowns = firstGroup.Where(x => x.RelatedBDIntCode == down.ProcessingRowIntCode).ToList(); 
       if (relatedBreakDowns.Count == 0) continue; 

       var filters = string.Format("BDIntCode IN ({0})", string.Join(",", relatedBreakDowns.Select(x => x.BDIntCode))); 
       var filteredDatatable = datatable.Select(filters, "BDIntCode"); 
       foreach (var dataRow in filteredDatatable) 
       { 
        var r = dataTableSchema.NewRow(); 
        r["RecordUser"] = recordUser; 
        r["RecordDT"] = DateTime.Now; 
        r["TxnQuantity"] = dataRow["TxnQuantity"]; 
        r["TxnDT"] = dataRow["TxnDT"]; 
        r["BDIntCode"] = down.ProcessingRowIntCode; 
        dataTableSchema.Rows.Add(r); 
       } 
       sw.Stop(); 
       count++; 
       Console.WriteLine("Group: " + unrelatedBreakDownGroup.RelatedBDGroupIntCode + ", Count : " + count + ", ElapsedTime : ms = " + sw.ElapsedMilliseconds + ", sec = " + sw.ElapsedMilliseconds/1000f); 
       sw.Reset(); 
      }

的故障列表的數量是1805年，FirstGroup的列表的數量是9880

來源

2014-06-20 sinanakyazici

第一個問題：爲什麼你在'DataTable'中有？ –

我已閱讀你的答案。我已經使用框架返回一個數據表。但我可以改變它。 – sinanakyazici

是公平的，我的答案中的大部分都可以在DataTable中正常工作 - 它不太方便，並且有一些不必要的開銷。這不是最大的問題。 –

就我個人而言，我將從List<SomeType>開始，而不是數據表。然後我會索引的數據：在你的情況，你是RelatedBDIntCode搜索和期待多場比賽，所以：

var index = firstGroup.ToLookup(x => x.RelatedBDIntCode); 
foreach (var down in breakDowns) { 
    var matches = index[down.ProcessingRowIntCode].ToList(); 
    //... 
}

這避免了在breakDowns做的firstGroup一個完整的掃描爲每個項目。

下一個IN可能被移動到一個類似索引的搜索，這次大概是BDIntCode。

來源

2014-06-20 09:12:21

只是要詳細闡述Marc的答案 - 您應該嘗試減少代碼執行的迭代次數。

您的代碼當前編寫的方式是1805次遍歷分解集合，然後對於這些迭代中的每一個迭代，您將遍歷第一組集合9880次，因此總計17833400次迭代未計算數據表過濾。

因此，您的方法應該是嘗試預先編制數據索引，以減少執行的迭代次數。

因此，第一步可能是創建一個RelatedBDIntCode到datatable正確行的索引映射到Dictionary中。然後，您可以遍歷breakDowns並拉出映射行每個down這樣的：

var dtIndexed = 
    firstGroup 
    .GroupBy(x => x.RelatedBDIntCode) 
    .ToDictionary 
    (
     x => x.Key, //the RelatedBDIntCode you'll be selecting with 
     x =>  //the mapped rows. This is the same method of filtering, but you could try others 
     { 
      var filters = string.Format("BDIntCode IN ({0})", string.Join(",", x.Select(y => y.BDIntCode))); 
      return datatable.Select(filters, "BDIntCode"); 
     }  
    ); 

foreach (var down in breakDowns) 
{ 
    if(!dtIndexed.ContainsKey(down.ProcessingRowIntCode)) continue; 

    var rows = dtIndexed[down.ProcessingRowIntCode]; 

    foreach (var row in rows) 
    { 
     var r = dataTableSchema.NewRow(); 
     r["RecordUser"] = recordUser; 
     r["RecordDT"] = DateTime.Now; 
     r["TxnQuantity"] = row["TxnQuantity"]; 
     r["TxnDT"] = row["TxnDT"]; 
     r["BDIntCode"] = down.ProcessingRowIntCode; 
     dataTableSchema.Rows.Add(r); 
    } 
}

這種做法應該減少重複你的代碼執行的數量，從而提高性能。

請注意，在上面的代碼中，我使用了完全相同的方法在數據表上執行過濾 - 例如datatable.Select(filter, order)。你可以嘗試使用datatable.AsEnumerable().Where(row => ...)以及

來源

2014-06-20 12:40:48 devduder

在數據表中過濾有更快的方法嗎？

回答

相關問題