0
我有收集與下面的數據(集合包含超過10萬條記錄)MongoDB的重複計數問題
> db.LogBuff.find()
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" }
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" }
我希望得到以下類型的輸出
{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 }
這是我的代碼
db.LogBuff.mapReduce(
function(){
emit({ SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);
},
function(key,values){
return $count:1 <-stuck here
}
)
由於一些限制,我無法使用聚合方法。我用下面的聚合代碼:
db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},])
儘管這適用於記錄數量有限,對大量返回這個錯誤(注意 - 我不是root用戶,所以我不能改變的配置) :
斷言:命令失敗:{「OK」:0,「ERRMSG」:「有點超出內存104857600個字節限制,但並沒有對外部排序選擇中止操作傳遞allowDiskUse:。真實的選擇in「,」code「:16819}:
聚合失敗[email protected]/mongo/shell/utils.js:25:13
使用聚合框架試過嗎?或者你只能使用MapReduce? –
我使用了聚合,但它工作的記錄數量有限,大量返回跟隨memomry錯誤(我不是root用戶,因此我無法更改配置) – Kavinda
斷言:命令失敗:{ 「ok」:0 , 「errmsg」:「排序超過104857600字節的內存限制,但未選擇進行外部排序。中止操作。通過allowDiskUse:true以選擇加入。」, 「code」:16819 }:聚合失敗 _getErrorWithCode @ src/mongo/shell/utils.js:25:13 [email protected]/mongo/shell/assert.js:13:14 – Kavinda