2015-04-17 166 views
1

的基礎上,比方說,我有未來5個文檔:聚集組嵌套文檔

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] } 
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] } 
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] } 
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] } 
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] } 

我要操作的集合,這樣它會返回一組學生(他們_id)通過設置(組合)他們採取的課程並計算每組中有多少學生。

在上面的例子我的課程和如下面的學生的數目3組(組合):

1 - [ "A", "B" ] < - 2學生藉此組合

2 - [ "A", "B", "C" ] < - 2學生

3 - [ "A", "B", "D" ] < - 1學生

我覺得這是莫再像MapReduce任務,而不是Aggregation ...不知道...

更新1

由於很多@ExplosionPills

因此下面的聚集命令:

db.students.aggregate([{ 
    $group: { 
     _id: "$courses", 
     count: {$sum: 1}, 
    students: {$push: "$_id"} 
    } 
}]) 

給我以下輸出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] } 
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] } 
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] } 

它按照一組課程分組,計算屬於它的學生數量和他們的_id s。

UPDATE 2

我發現,聚合上述治療組合[ "C", "A", "B" ][ "A", "B", "C" ]不同。但我需要這兩個數字相同。

那麼讓我們來看看以下文件:

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] } 
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] } 
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] } 
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] } 
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] } 
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] } 

讓我們來看看在輸出:

{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] } 
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] } 
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] } 
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] } 

見線1和3 - 這不是我想要的。

因此,爲了治療和[ "C", "A", "B" ]作爲[ "A", "B", "C" ]相同組合我改變了聚合如下:

db.students.aggregate([ 
    {$unwind: "$courses" }, 
    {$sort : {"courses": 1}}, 
    {$group: {_id: "$_id", courses: {$push: "$courses"}}}, 
    {$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}} 
    ]) 

輸出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] } 
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] } 
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] } 

回答

1

這是使用分組的集合操作。

db.students.aggregate([{ 
    $group: { 
     // Uniquely identify the document. 
     // The $ syntax queries on this field 
     _id: "$courses", 

     // Add 1 for each field found (effectively a counter) 
     count: {$sum: 1} 
    } 
}]); 

編輯:

如果課程可以在任何順序,可以$unwind$sort,並再次$group的建議在編輯的問題。這也可以通過mapReduce來完成,但我不確定哪個更快。

db.students.mapReduce(
    function() { 
     // Use the sorted courses as the key 
     emit(this.courses.sort(), this._id); 
    }, 
    function (key, values) { 
     return {"students": values, count: values.length}; 
    }, 
    {out: {inline: 1}} 
) 
+0

謝謝!是否可以輸出學生的'_id's列表作爲每個組的嵌套屬性? – Askar

+0

您可以添加另一個分組,如學生:{$ push:「$ _id」}' –

+0

非常感謝!大! :) – Askar