2015-05-28 58 views
1

我們的管道之一拋出了以下錯誤。我們第一次看到它。我們從BigQuery表中運行了大約6.25億行。這項工作仍然完成,並在控制檯中被記錄爲「成功」。但是我們擔心的可能是Dataflow無法寫入GCS的文件(Dataflow寫入GCS然後加載到BigQuery中)未加載到BigQuery中,因此我們現在缺少一些數據。數據流錯誤 - 「IOException:無法寫入GCS路徑..」「後端錯誤500」

由於我們處理的數據量龐大,我們很難確定這些行是否已加載。

有什麼方法可以知道Dataflow是否確實加載了該文件?

職位編號:2015-05-27_18_21_21-8377993823053896089

2015-05-28T01:21:23.210Z: (c1e36887ebb5e3b3): Autoscaling: Enabled for job /workflows/wf-2015-05-27_18_21_21-8377993823053896089 
2015-05-28T01:22:23.711Z: (45988c062ea96b38): Autoscaling: Resizing worker pool from 1 to 3. 
2015-05-28T01:23:53.713Z: (45988c062ea96352): Autoscaling: Resizing worker pool from 3 to 12. 
2015-05-28T01:25:23.715Z: (45988c062ea96b6c): Autoscaling: Resizing worker pool from 12 to 48. 
2015-05-28T01:26:53.716Z: (45988c062ea96386): Autoscaling: Resizing worker pool from 48 to 64. 
2015-05-28T01:48:48.863Z: (54b9f9ed2402c4e7): java.io.IOException: Failed to write to GCS path gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/-shard-00000-of-00001_C183_00000-of-00001-try-52ba464032d439ee-endshard.json. 
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.throwIfUploadFailed(GoogleCloudStorageWriteChannel.java:372) 
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.close(GoogleCloudStorageWriteChannel.java:270) 
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243) 
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100) 
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:130) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:95) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 
{ 
    "code" : 500, 
    "errors" : [ { 
    "domain" : "global", 
    "message" : "Backend Error", 
    "reason" : "backendError" 
    } ], 
    "message" : "Backend Error" 
} 
    at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) 
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) 
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) 
    at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel$UploadOperation.run(GoogleCloudStorageWriteChannel.java:166) 
    ... 3 more 

2015-05-28T01:48:53.870Z: (4aaf52256f502f1a): Failed task is going to be retried. 
2015-05-28T02:00:49.444Z: S09: (aafd22d37feb496e): Unable to delete temporary files gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/@DAX.json$ Causes: (aafd22d37feb4227): Unable to delete directory: gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a. 

回答

4

數據流次輸入失敗的任務(最多4次)。在這種情況下,看起來錯誤是暫時的,並且任務成功重試。您的數據應該完整。