1
我們的管道之一拋出了以下錯誤。我們第一次看到它。我們從BigQuery表中運行了大約6.25億行。這項工作仍然完成,並在控制檯中被記錄爲「成功」。但是我們擔心的可能是Dataflow無法寫入GCS的文件(Dataflow寫入GCS然後加載到BigQuery中)未加載到BigQuery中,因此我們現在缺少一些數據。數據流錯誤 - 「IOException:無法寫入GCS路徑..」「後端錯誤500」
由於我們處理的數據量龐大,我們很難確定這些行是否已加載。
有什麼方法可以知道Dataflow是否確實加載了該文件?
職位編號:2015-05-27_18_21_21-8377993823053896089
2015-05-28T01:21:23.210Z: (c1e36887ebb5e3b3): Autoscaling: Enabled for job /workflows/wf-2015-05-27_18_21_21-8377993823053896089
2015-05-28T01:22:23.711Z: (45988c062ea96b38): Autoscaling: Resizing worker pool from 1 to 3.
2015-05-28T01:23:53.713Z: (45988c062ea96352): Autoscaling: Resizing worker pool from 3 to 12.
2015-05-28T01:25:23.715Z: (45988c062ea96b6c): Autoscaling: Resizing worker pool from 12 to 48.
2015-05-28T01:26:53.716Z: (45988c062ea96386): Autoscaling: Resizing worker pool from 48 to 64.
2015-05-28T01:48:48.863Z: (54b9f9ed2402c4e7): java.io.IOException: Failed to write to GCS path gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/-shard-00000-of-00001_C183_00000-of-00001-try-52ba464032d439ee-endshard.json.
at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.throwIfUploadFailed(GoogleCloudStorageWriteChannel.java:372)
at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel.close(GoogleCloudStorageWriteChannel.java:270)
at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:74)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:130)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:95)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.dataflow.sdk.util.gcsio.GoogleCloudStorageWriteChannel$UploadOperation.run(GoogleCloudStorageWriteChannel.java:166)
... 3 more
2015-05-28T01:48:53.870Z: (4aaf52256f502f1a): Failed task is going to be retried.
2015-05-28T02:00:49.444Z: S09: (aafd22d37feb496e): Unable to delete temporary files gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a/@DAX.json$ Causes: (aafd22d37feb4227): Unable to delete directory: gs://<removed>/15697574167464387868/dax-tmp-2015-05-27_18_21_21-8377993823053896089-S09-1-731cba632206348a.