2016-07-15 35 views
0

我有收集與下面的數據(集合包含超過10萬條記錄)MongoDB的重複計數問題

> db.LogBuff.find() 
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" } 
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" } 

我希望得到以下類型的輸出

{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 } 
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 } 

這是我的代碼

db.LogBuff.mapReduce(  
    function(){   
     emit({ SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);  
    },  
    function(key,values){   
     return $count:1 <-stuck here 
    } 
) 

由於一些限制,我無法使用聚合方法。我用下面的聚合代碼:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},]) 

儘管這適用於記錄數量有限,對大量返回這個錯誤(注意 - 我不是root用戶,所以我不能改變的配置) :

斷言:命令失敗:{「OK」:0,「ERRMSG」:「有點超出內存104857600個字節限制,但並沒有對外部排序選擇中止操作傳遞allowDiskUse:。真實的選擇in「,」code「:16819}:
聚合失敗[email protected]/mongo/shell/utils.js:25:13

+0

使用聚合框架試過嗎?或者你只能使用MapReduce? –

+0

我使用了聚合,但它工作的記錄數量有限,大量返回跟隨memomry錯誤(我不是root用戶,因此我無法更改配置) – Kavinda

+0

斷言:命令失敗:{ 「ok」:0 , 「errmsg」:「排序超過104857600字節的內存限制,但未選擇進行外部排序。中止操作。通過allowDiskUse:true以選擇加入。」, 「code」:16819 }:聚合失敗 _getErrorWithCode @ src/mongo/shell/utils.js:25:13 [email protected]/mongo/shell/assert.js:13:14 – Kavinda

回答

1

嘗試使用allowDiskUse選項:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}}], {allowDiskUse: true})
+0

謝謝,工作正常 – Kavinda