儘管對於一個非常小的數據集,您的map-reduce代碼沒有任何問題。我認爲return values[0];
中的reduce函數會是一個複製粘貼錯誤。你可以通過mongo shell嘗試。
由於骨料不存在2.4,我不能使用聚合管道找到重複的,因此,我試圖找到一個解決方案 使用MapReduce的。
你理解錯了這裏,db.collection.aggregate(pipeline, options)
在version 2.2
推出。
這裏是它如何能與aggregation
框架內完成,但它不會是首選,因爲你的數據集是非常巨大的,而且$sort
運營商擁有的RAM 10%內存限制,在V2.4。
db.collection.aggregate(
[
// sort the records, based on the 'ScrapeDate' field, in descending order.
{$sort:{"ScrapeDate":-1}},
// group by the key fields, and take the 'ScrapeDate' of the first document,
// Since it is in sorted order, the first document would contain the
// highest field value.
{$group:{"_id":{"name":"$name","LocationId":"$LocationId","version":"$version"}
,"ScrapeDate":{$first:"$ScrapeDate"}
,"count":{$sum:1}}
},
// output only the group, having documents greater than 1.
{$match:{"count":{$gt:1}}}
]
);
來到您的Map-reduce功能,它在我的測試數據上運行時沒有問題。
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2000-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2001-01-01"});
db.collection.insert({"name":"c","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
db.collection.insert({"name":"d","LocationId":1,"version":1,"ScrapeDate":"2002-01-01"});
運行的地圖減少,
db.collection.mapReduce(Map,Reduce,{out:{"inline":1},finalize:Finalize});
O/P:
{
"results" : [
{
"_id" : {
"name" : "c",
"LocationId" : 1,
"version" : 1
},
"value" : {
"count" : 3,
"ScrapeDate" : "2002-01-01"
}
},
{
"_id" : {
"name" : "d",
"LocationId" : 1,
"version" : 1
},
"value" : null
}
],
"timeMillis" : 0,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 1,
"output" : 2
},
"ok" : 1,
}
注意輸出包含value:null
對於不具有任何重複的記錄。
這是由於您的finalize
功能:
function Finalize(key, reduced) {
if (reduced.count > 1)
return reduced; // returned null by default for keys with single value,
// i.e count=1
}
的finalize
功能不過濾掉鑰匙。所以你不能只得到重複的密鑰。您將獲得map-reduce輸出中的所有鍵。在您的定稿功能中,您只需不顯示其值,這就是您正在做的。