0
我在Google雲端數據流上遇到了Apache Beam pipline問題。Dataflow pipline「與服務失去聯繫」
流水線很簡單:從GCS中讀取json,從一些嵌套字段中提取文本,寫回到GCS。
當使用較小的輸入文件子集進行測試時,它運行良好,但是當我在完整的數據集上運行它時,出現以下錯誤(在260M左右的項目中正常運行後)。
不知何故
(8662a188e74dae87): Workflow failed. Causes: (95e9c3f710c71bc2): S04:ReadFromTextWithFilename/Read+FlatMap(extract_text_from_raw)+RemoveLineBreaks+FormatText+WriteText/Write/WriteImpl/WriteBundles/Do+WriteText/Write/WriteImpl/Pair+WriteText/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteText/Write/WriteImpl/GroupByKey/Reify+WriteText/Write/WriteImpl/GroupByKey/Write failed., (da6389e4b594e34b): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on:
extract-tags-150110997000-07261602-0a01-harness-jzcn,
extract-tags-150110997000-07261602-0a01-harness-828c,
extract-tags-150110997000-07261602-0a01-harness-3w45,
extract-tags-150110997000-07261602-0a01-harness-zn6v
堆棧跟蹤顯示Failed to update work status
/Progress reporting thread got error
錯誤 「的服務工人最終失去了聯繫」:
Exception in worker loop: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 776, in run deferred_exception_details=deferred_exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 629, in do_work exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 490, in report_completion_status exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 298, in report_status work_executor=self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 333, in report_status self._client.projects_locations_jobs_workItems.ReportStatus(request)) File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 467, in ReportStatus config, request, global_params=global_params) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 600, in __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/qollaboration-live/locations/us-central1/jobs/2017-07-26_16_02_36-1885237888618334364/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 26 Jul 2017 23:54:12 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(7f8a0ec09d20c3a3): Failed to publish the result of the work update. Causes: (7f8a0ec09d20cd48): Failed to update work status. Causes: (afa1cd74b2e65619): Failed to update work status., (afa1cd74b2e65caa): Work \"6306998912537661254\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >
最後:
HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/[projectid-redacted]/locations/us-central1/jobs/2017-07-26_18_28_43-10867107563808864085/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '358', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Thu, 27 Jul 2017 02:00:10 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(5845363977e915c1): Failed to publish the result of the work update. Causes: (5845363977e913a8): Failed to update work status. Causes: (44379dfdb8c2b47): Failed to update work status., (44379dfdb8c2e88): Work \"9100669328839864782\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >
at __ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:600)
at ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:729)
at _RunMethod (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:723)
at ReportStatus (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py:467)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py:333)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:298)
at report_completion_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:490)
at wrapper (/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py:168)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:629)
at run (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:776)
這看起來就像我的數據流內部錯誤一樣。有人可以確認嗎?有沒有解決方法?
我之前主要查看stacktrace中的錯誤消息,但看起來在info部分也有一些類似錯誤的消息。使用其他詳細信息更新問題 – Andreas
您是否按照文章中的建議查看UI的「堆棧跟蹤」選項卡?具體而言,應該有與報告狀態無關的消息 - 這些應該是指示代碼中實際問題的消息。看起來您只是從「作業日誌」選項卡添加更多信息。 –
http錯誤是堆棧跟蹤選項卡中唯一的錯誤。我發佈的其餘部分來自堆棧驅動程序。 – Andreas