2017-01-03 78 views
2

我一直使用相同的代碼很長一段時間它曾經工作,但是當我重新運行我們的批量加載程序它給錯誤沒有足夠的磁盤空間,所以我增加了磁盤大小,並再次運行然後我得到管道破錯誤,如下面谷歌Dataflow作業連續失敗:「管道破損」

(84383c8e79f9b6a1): java.io.IOException: java.io.IOException: Pipe broken 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431) 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289) 
    at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243) 
    at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100) 
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:254) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:191) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:144) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:180) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:161) 
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:148) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Pipe broken 
    at java.io.PipedInputStream.read(PipedInputStream.java:321) 
    at java.io.PipedInputStream.read(PipedInputStream.java:377) 
    at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181) 
    at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629) 
    at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409) 
    at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) 
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357) 
    ... 4 more 

此錯誤有時正常,但批處理作業最終完成,但現在它不是精加工和幾個小時後,未能在中間。

我有點被這個錯誤阻塞,不知道如何繼續,並讓我們的批處理加載程序再次啓動。

+0

什麼是工作ID? – jkff

+0

@jkff感謝您的答覆,這些都是一些jobIds我試過 2016-12-29_12_29_45-1148799671021575971, 2016-12-27_13_15_06-5770071057185164024, 2016-12-28_01_43_39-6724563055033327735 – Lahiru

+0

謝謝,我已經開了一個內部票調查雲存儲問題,我會隨時更新。 – jkff

回答

0

發佈解決上述評論主題上最後一個問題的答案。

該消息"CoGbkResult has more than 10000 elements, reiteration (which may be slow) is required"is not an error。選擇10000個元素作爲一次保存在內存中的最大數量,並且它只是讓您知道,如果您有超過10000個元素,它必須對剩餘結果重新進行迭代。

我建議繼續就[email protected]調試問題作爲jkff建議,而不是在評論線程,因爲它生長在堆棧溢出問題的範圍之外。