我有一個CSV文件,需要對這些記錄進行排序,然後將其分組爲任意大小的批次(例如,每批次最多可記錄300條記錄)。每個批次的記錄可能少於300個,因爲每個批次的內容必須是同質的(基於不同列的內容)。與多個批次批次
我的LINQ聲明,對batching with LINQ啓發這樣的回答,看起來是這樣的:
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID).
Select((data, index) => new {
Record = new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]
)
,
Index = index}
).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index/100});
的 「/ 100」 給我的任意桶大小。 groupby的其他元素旨在實現批次之間的均一性。我懷疑這幾乎是我想要的,但它給我以下編譯器錯誤:A query body must end with a select clause or a group clause
。我明白爲什麼我收到錯誤,但總體而言,我不確定如何解決此查詢。它將如何完成?
UPDATE我非常接近實現我後,有以下幾點:
List<EbrRecord> input = new List<EbrRecord> {
new EbrRecord {Name = "Brent",Age = 20,ID = "A"},
new EbrRecord {Name = "Amy",Age = 20,ID = "B"},
new EbrRecord {Name = "Gabe",Age = 23,ID = "B"},
new EbrRecord {Name = "Noah",Age = 27,ID = "B"},
new EbrRecord {Name = "Alex",Age = 27,ID = "B"},
new EbrRecord {Name = "Stormi",Age = 27,ID = "B"},
new EbrRecord {Name = "Roger",Age = 27,ID = "B"},
new EbrRecord {Name = "Jen",Age = 27,ID = "B"},
new EbrRecord {Name = "Adrian",Age = 28,ID = "B"},
new EbrRecord {Name = "Cory",Age = 29,ID = "C"},
new EbrRecord {Name = "Bob",Age = 29,ID = "C"},
new EbrRecord {Name = "George",Age = 29,ID = "C"},
};
//look how tiny this query is, and it is very nearly the result I want!!!
int i = 0;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = i++/3 };
foreach (var agroup in result)
{
Debug.WriteLine("ID:" + agroup.Key);
foreach (var record in agroup)
{
Debug.WriteLine(" Name:" + record.Name);
}
}
這裏的竅門是繞過選擇「索引位置」 overlaod,通過使用閉包變量(int i
在這個案例)。輸出結果如下:
ID:{ ID = A, batch = 0 }
Name:Brent
ID:{ ID = B, batch = 0 }
Name:Amy
Name:Gabe
ID:{ ID = B, batch = 1 }
Name:Noah
Name:Alex
Name:Stormi
ID:{ ID = B, batch = 2 }
Name:Roger
Name:Jen
Name:Adrian
ID:{ ID = C, batch = 3 }
Name:Cory
Name:Bob
Name:George
雖然這個答案是可以接受的,但它只是一小部分的理想結果。應該是,第一次出現「批次B」應該有3個動詞(Amy,Gabe,Noah) - 不是兩個(Amy,Gabe)。這是因爲索引位置在每個組被識別時未被重置。任何人都知道如何重置每個組的自定義索引位置?
UPDATE 2 我想我可能找到了答案。首先,像這樣的附加功能:
public static bool BatchGroup(string ID, ref string priorID)
{
if (priorID != ID)
{
priorID = ID;
return true;
}
return false;
}
其次,更新LINQ查詢是這樣的:
int i = 0;
string priorID = null;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i)/3 };
現在我想要做什麼。我只是希望我不需要那個單獨的功能!
我的智能感知和編譯器拒絕讓我在「選擇新的」之後放置「group by」,除非我切換爲點符號。 – 2011-06-02 19:33:35
「入x」很重要。 – 2011-06-02 19:33:57
修復了許多令人尷尬的錯別字。現在我完成了(無論是否有效)。 – 2011-06-02 19:35:42