0

有沒有人成功地將大型數據存儲類型備份到雲存儲?這是一個實驗性功能,所以在谷歌上的支持非常粗略。將大型數據存儲種類(1TB +)備份到谷歌雲存儲

我們希望備份到雲存儲(最終以從雲存儲攝入大查詢爲目標)的問題類型目前的容量爲1.2TB。

- description: BackUp 
    url: /_ah/datastore_admin/backup.create?name=OurApp&filesystem=gs&gs_bucket_name=OurBucket&queue=backup&kind=LargeKind 
    schedule: every day 00:00 
    timezone: America/Regina 
    target: ah-builtin-python-bundle 

我們一直運行到以下錯誤消息:

Traceback (most recent call last): 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 182, in handle 
    input_reader, shard_state, tstate, quota_consumer, ctx) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 263, in process_inputs 
    entity, input_reader, ctx, transient_shard_state): 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 318, in process_data 
    output_writer.write(output, ctx) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 711, in write 
    ctx.get_pool("file_pool").append(self._filename, str(data)) 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 266, in append 
    self.flush() 
    File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 288, in flush 
    f.write(data) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 297, in __exit__ 
    self.close() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 291, in close 
    self._make_rpc_call_with_retry('Close', request, response) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 427, in _make_rpc_call_with_retry 
    _make_call(method, request, response) 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 250, in _make_call 
    rpc.check_success() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 570, in check_success 
    self.__rpc.CheckSuccess() 
    File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess 
    raise self.exception 
DeadlineExceededError: The API call file.Close() took too long to respond and was cancelled. 

回答

1

似乎有要的寫操作28秒內從GAE的無證時限到雲存儲。 這也適用於在後端進行的寫入操作,因此您可以從雲存儲中的gae 創建的最大文件大小取決於您的吞吐量。我們的解決方案是分割文件;每當作家任務 接近20秒時,它會關閉當前文件並打開一個新文件,然後我們在本地加入這些文件。對我們來說,這導致大約500KB(壓縮)的文件,所以這可能不是您可以接受的解決方案...