與多個批次批次

我有一個CSV文件，需要對這些記錄進行排序，然後將其分組爲任意大小的批次（例如，每批次最多可記錄300條記錄）。每個批次的記錄可能少於300個，因爲每個批次的內容必須是同質的（基於不同列的內容）。與多個批次批次

我的LINQ聲明，對batching with LINQ啓發這樣的回答，看起來是這樣的：

var query = (from line in EbrRecords 
      let EbrData = line.Split('\t') 
      let Location = EbrData[7] 
      let RepName = EbrData[4] 
      let AccountID = EbrData[0] 
      orderby Location, RepName, AccountID). 
      Select((data, index) => new { 
       Record = new EbrRecord(
       AccountID = EbrData[0], 
       AccountName = EbrData[1], 
       MBSegment = EbrData[2], 
       RepName = EbrData[4], 
       Location = EbrData[7], 
       TsrLocation = EbrData[8] 
       ) 
       , 
       Index = index} 
       ).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index/100});

的「/ 100」給我的任意桶大小。 groupby的其他元素旨在實現批次之間的均一性。我懷疑這幾乎是我想要的，但它給我以下編譯器錯誤：A query body must end with a select clause or a group clause。我明白爲什麼我收到錯誤，但總體而言，我不確定如何解決此查詢。它將如何完成？

UPDATE我非常接近實現我後，有以下幾點：

List<EbrRecord> input = new List<EbrRecord> { 
    new EbrRecord {Name = "Brent",Age = 20,ID = "A"}, 
    new EbrRecord {Name = "Amy",Age = 20,ID = "B"}, 
    new EbrRecord {Name = "Gabe",Age = 23,ID = "B"}, 
    new EbrRecord {Name = "Noah",Age = 27,ID = "B"}, 
    new EbrRecord {Name = "Alex",Age = 27,ID = "B"}, 
    new EbrRecord {Name = "Stormi",Age = 27,ID = "B"}, 
    new EbrRecord {Name = "Roger",Age = 27,ID = "B"}, 
    new EbrRecord {Name = "Jen",Age = 27,ID = "B"}, 
    new EbrRecord {Name = "Adrian",Age = 28,ID = "B"}, 
    new EbrRecord {Name = "Cory",Age = 29,ID = "C"}, 
    new EbrRecord {Name = "Bob",Age = 29,ID = "C"}, 
    new EbrRecord {Name = "George",Age = 29,ID = "C"}, 
    }; 

//look how tiny this query is, and it is very nearly the result I want!!! 
int i = 0; 
var result = from q in input 
       orderby q.Age, q.ID 
       group q by new { q.ID, batch = i++/3 }; 

foreach (var agroup in result) 
{ 
    Debug.WriteLine("ID:" + agroup.Key); 
    foreach (var record in agroup) 
    { 
     Debug.WriteLine(" Name:" + record.Name); 
    } 
}

這裏的竅門是繞過選擇「索引位置」 overlaod，通過使用閉包變量（int i在這個案例）。輸出結果如下：

ID:{ ID = A, batch = 0 } 
Name:Brent 
ID:{ ID = B, batch = 0 } 
Name:Amy 
Name:Gabe 
ID:{ ID = B, batch = 1 } 
Name:Noah 
Name:Alex 
Name:Stormi 
ID:{ ID = B, batch = 2 } 
Name:Roger 
Name:Jen 
Name:Adrian 
ID:{ ID = C, batch = 3 } 
Name:Cory 
Name:Bob 
Name:George

雖然這個答案是可以接受的，但它只是一小部分的理想結果。應該是，第一次出現「批次B」應該有3個動詞（Amy，Gabe，Noah） - 不是兩個（Amy，Gabe）。這是因爲索引位置在每個組被識別時未被重置。任何人都知道如何重置每個組的自定義索引位置？

UPDATE 2 我想我可能找到了答案。首先，像這樣的附加功能：

public static bool BatchGroup(string ID, ref string priorID) 
    { 
     if (priorID != ID) 
     { 
      priorID = ID; 
      return true; 
     } 
     return false; 
    }

其次，更新LINQ查詢是這樣的：

int i = 0; 
string priorID = null; 
var result = from q in input 
       orderby q.Age, q.ID 
      group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i)/3 };

現在我想要做什麼。我只是希望我不需要那個單獨的功能！

來源

2011-06-02 Brent Arias

orderby Location, RepName, AccountID

需要有一個SELECT子句以上後，在StriplingWarrior的回答證實。 Linq Comprehension查詢必須以select或group by結尾。

遺憾的是，邏輯缺陷...假設我有第一組中的50個帳戶和100個賬戶爲100的批量大小的第二組中的原碼將產生大小爲50的3批，而不是2批50,100。

這裏有一種方法來解決它。

IEnumerable<IGrouping<int, EbrRecord>> query = ... 

    orderby Location, RepName, AccountID 
    select new EbrRecord(
    AccountID = EbrData[0], 
    AccountName = EbrData[1], 
    MBSegment = EbrData[2], 
    RepName = EbrData[4], 
    Location = EbrData[7], 
    TsrLocation = EbrData[8]) into x 
    group x by new {Location = x.Location, RepName = x.RepName} into g 
    from g2 in g.Select((data, index) => new Record = data, Index = index }) 
       .GroupBy(y => y.Index/100, y => y.Record) 
    select g2; 


List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();

另外請注意，使用的GroupBy批處理是很慢的，由於多餘的迭代。你可以編寫一個for循環，它將在有序集合上執行一遍，並且該循環運行速度比LinqToObjects快得多。

來源

2011-06-02 19:20:23

我的智能感知和編譯器拒絕讓我在「選擇新的」之後放置「group by」，除非我切換爲點符號。 – 2011-06-02 19:33:35

「入x」很重要。 – 2011-06-02 19:33:57

修復了許多令人尷尬的錯別字。現在我完成了（無論是否有效）。 – 2011-06-02 19:35:42

這是行不通的？

var query = (from line in EbrRecords 
     let EbrData = line.Split('\t') 
     let Location = EbrData[7] 
     let RepName = EbrData[4] 
     let AccountID = EbrData[0] 
     orderby Location, RepName, AccountID 
     select new EbrRecord(
       AccountID = EbrData[0], 
       AccountName = EbrData[1], 
       MBSegment = EbrData[2], 
       RepName = EbrData[4], 
       Location = EbrData[7], 
       TsrLocation = EbrData[8]) 
     ).Select((data, index) => new 
     { 
      Record = data, 
      Index = index 
     }) 
     .GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index/100}, 
      x => x.Record);

來源

2011-06-02 19:00:02 StriplingWarrior

我期望的是EbrRecord列表（列表列表）。但上面給了我一個匿名類型的列表，其中只包含Location，RepName和批處理。我想知道我鏈接的帖子是否真的做到了我的想法或希望。 – 2011-06-02 19:26:10

@Brent：GroupBy將創建'IGrouping'的IEnumerable，它們每個都有一個帶有Location，RepName和批處理的Key，但是它本身也是一個IEnumerable，它包含所選的值。如果您在更新後的答案中使用重載，您應該有一個'IEnumerable >'。不過，我認爲它可能不會達到你所期望的。請務必閱讀David B的回答。他提出了一些優秀的觀點。 – StriplingWarrior 2011-06-02 20:44:00

與多個批次批次

回答

相關問題