2016-11-25 116 views
0

我的目標是讓我的Map-Reduce作業始終在我的MongoDB集羣的分片的輔助節點上運行。在分片羣集上運行MapReduce時,MongoDB會忽略readPreference?

我將readPreference設置爲secondary,將out參數的MapReduce命令設置爲inline以實現此目的。這在非分片副本集合上工作正常:作業在輔助副本上運行。但是,在分片羣集上,此作業在Primary上運行。

有人可以解釋爲什麼發生這種情況或指向任何相關的文檔?我在relevant documentation中找不到任何東西。從二次

public static final String mapfunction = "function() { emit(this.custid, this.txnval); }"; 
public static final String reducefunction = "function(key, values) { return Array.sum(values); }"; 
... 
private void mapReduce() { 
... 
MapReduceIterable<Document> iterable = collection.mapReduce(mapfunction, reducefunction); 
... 
} 
... 
Builder options = MongoClientOptions.builder().readPreference(ReadPreference.secondary()); 
MongoClientURI uri = new MongoClientURI(MONGO_END_POINT, options); 
MongoClient client = new MongoClient(uri); 
... 

日誌時,這是一個上副本集執行:

2016-11-23T15:05:26.735 + 0000我COMMAND [conn671]命令test.txns命令:MapReduce的映射精簡{: 「txns」,map:function(){emit(this.custid,this.txnval); },reduce:function(key,values){return Array.sum(values); },out:{inline:1},query:null,sort:null,finalize:null,scope:null,verbose:true} planSummary:COUNT keyUpdates:0 writeConflicts:0 numYields:7 reslen:4331 locks:全球:{acquireCount:{r:44}},數據庫:{acquireCount:{r:3,R:19}},集合:{acquireCount:{r:3}}}協議:op_query 124ms

Sharded collection :從碎片-0初級

mongos> db.txns.getShardDistribution() 

Shard Shard-0 at Shard-0/primary.shard0.example.com:27017,secondary.shard0.example.com:27017 
data : 498KiB docs : 9474 chunks : 3 
estimated data per chunk : 166KiB 
estimated docs per chunk : 3158 

Shard Shard-1 at Shard-1/primary.shard1.example.com:27017,secondary.shard1.example.com:27017 
data : 80KiB docs : 1526 chunks : 3 
estimated data per chunk : 26KiB 
estimated docs per chunk : 508 

Totals 
data : 579KiB docs : 11000 chunks : 6 
Shard Shard-0 contains 86.12% data, 86.12% docs in cluster, avg obj size on shard : 53B 
Shard Shard-1 contains 13.87% data, 13.87% docs in cluster, avg obj size on shard : 53B 

日誌:

2016-11-24T08:46:30.828 + 0000我COMMAND [conn357]命令測試$ cmd命令:mapreduce.shardedfinish {mapred uce.shardedfinish:{mapreduce:「txns」,map:function(){emit(this.custid,this.txnval); },reduce:function(key,values){return Array.sum(values); },out:{in line:1},query:null,sort:null,finalize:null,scope:null,verbose:true,$ queryOptions:{$ readPreference:{mode:「secondary」}}},inputDB:「test」,shardedOutputCollection:「tmp.mrs.txns_1479977190_0」,shards:{Shard-0/primary.shard0.example.com:27017,secondary.shard0.example.com:27017:{result :「tmp.mrs.txns_1479977190_0」,timeMillis:123,timing:{mapTime:51,emitLoop:116,reduceTime:9,mode:「mixed」,total:123},counts:{input:9474,emit:9474, reduce:909,output:101},ok:1.0,$ gleS tats:{lastOpTime:Timestamp 1479977190000 | 103,electionId:ObjectId('7fffffff0000000000000001')}},Shard-1/primary.shard1.example.com:27017 ,secondary.shard1.example.com:27017:{result:「tmp.mrs.txns_1479977190_0」,timeMil lis:71,時間: {mapTime:8,emitLoop:63,reduceTime:4,mode:「mixed」,total:71},counts:{input:1526,emit:1526,reduce:197,output:101} ,ok:1.0,$ gleStats:{lastOpTime:Timestamp 1479977190000 | 103,electionId:ObjectId('7fffffff0000000000000001')}}},shardCounts:{Sha rd-0/primary.shard0.example.com:27017,secondary.shard0 .example.com:27017:{input:9474,emit:9474,reduce:909,output:101},Shard-1/primary.shard1.example.com:27017,secondary.shard1.example.com:27017:{ inpu t:1526,emit:1526,reduce:197,output:101}},counts:{emit:11000,input:11000,output:202,reduce:1106}} keyUpdates:0 writeConflicts:0 numYields:0 reslen :4368鎖:{全局:{acquireCount:{r:2}},數據庫:{acquireCount:{r:1}},集合:{acqu ireCount:{r:1}}}協議:op_command 115ms 2016- 11-24T08:46 :30.830 + 0000 I COMMAND [conn46] CMD:drop test.tmp.mrs。txns_1479977190_0

有關預期行爲的任何指針都會非常有用。謝謝。

回答

1

因爲我沒有得到迴應,在這裏,我提交的MongoDB的一個JIRA的bug,並發現,截至目前,這是不可能運行的map-reduce分片的MongoDB集羣對二次就業。這裏是the bug report

+0

寫了一篇關於這個原因的博客文章,這對於希望在MongoDB上挖掘其MR的人來說是一個重要的限制: https://scalegrid.io/blog/mongodb-performance-running-mongodb-map-reduce-操作上,次級/ – Vaibhaw

相關問題