-1
我在Ubuntu 14.04 使用tensorflow與cuda8我的CPU:的GeForce GT 740M 我是一個新手到GPU的 有時候,我已經運行在GPU上相同的腳本幾次後,我會得到一個內存錯誤,下次重啓時會消失。 感謝您與我分享您的專業知識。我真的不知道如何解決這個問題。CUDA_ERROR_OUT_OF_MEMORY的Ubuntu 14.04 cuda8
以下是錯誤消息:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910]
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:01:00.0
Total memory: 1.96GiB
Free memory: 118.75MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 118.75M (124518400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
那麼,沒有看到你的代碼,這是不可能的;可以肯定的,但它聽起來像你沒有釋放資源,導致內存泄漏(另一種選擇是內存gragmentatyion)。在GPU上定位和做這些事情總是很痛苦,這讓事情變得更有趣。你要麼需要跟蹤所有的內存分配,並確保它們被整理好,否則你將不得不刪除大量的代碼,直到問題消失。 – Basic