2012-10-16 93 views
0

我們試圖大量使用的MapReduce在我們的項目。 現在我們有這個問題,有很多的「DeadlineExceededError」在日誌中的錯誤......它(追溯每一次有點不同)AppEngine上的MapReduce NDB,DeadlineExceededError

一個例子:

Traceback (most recent call last): 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 207, in Handle 
    result = handler(dict(self._environ), self._StartResponse) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__ 
    rv = self.router.dispatch(request, response) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher 
    return route.handler_adapter(request, response) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__ 
    return handler.dispatch() 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch 
    return method(*args, **kwargs) 
    File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/base_handler.py", line 65, in post 
    self.handle() 
    File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/handlers.py", line 208, in handle 
    ctx.flush() 
    File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 333, in flush 
    pool.flush() 
    File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 221, in flush 
    self.__flush_ndb_puts() 
    File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 239, in __flush_ndb_puts 
    ndb.put_multi(self.ndb_puts.items, config=self.__create_config()) 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3625, in put_multi 
    for future in put_multi_async(entities, **ctx_options)] 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 323, in get_result 
    self.check_success() 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 318, in check_success 
    self.wait() 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 302, in wait 
    if not ev.run1(): 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 219, in run1 
    delay = self.run0() 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 181, in run0 
    callback(*args, **kwds) 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 365, in _help_tasklet_along 
    value = gen.send(val) 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 274, in _put_tasklet 
    keys = yield self._conn.async_put(options, datastore_entities) 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1560, in async_put 
    for pbs, indexes in pbsgen: 
    File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1350, in __generate_pb_lists 
    incr_size = pb.lengthString(pb.ByteSize()) + 1 
DeadlineExceededError 

我問題是:

  • 我們該如何避免此錯誤?
  • 與工作會發生什麼,是否得到重試(如果是的話,我們怎麼能控制嗎?)不?
  • 最終會導致數據不一致嗎?
+0

你在做一步到位的工作太多了? –

+0

似乎是這樣;)現在我們正在測試batch_size,它似乎在幫助。需要更多的測試,我可能會接受dragonx的答案。 –

回答

2

如果您使用的是InputReader,您可能可以調整默認的batch_size以減少每個任務處理的實體的數量。

我相信,任務隊列將重試任務,但你可能不希望它,因爲它會likley擊中同一DeadlineExceededError。

數據不一致是可能的。

看到這個問題爲好。 App Engine - Task Queue Retry Count with Mapper API

3

顯然你做了太多的插入比在一個數據存儲區調用中插入的可能性太多。你有多種選擇:

  1. 如果這是一個比較罕見的事件 - 忽略它。 Mapreduce會重試切片並降低放置池大小。確保你的地圖是冪等的。
  2. 看看http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/context.py - 在main.py中,您可以降低DATASTORE_DEADLINE,MAX_ENTITY_COUNTMAX_POOL_SIZE以降低整個映射精簡池的大小。