我打算使用1080 ti(11GB)GPU的tf-seq2seq封裝來訓練seq2seq模型。tensorflow:CUDA_ERROR_OUT_OF_MEMORY總是發生
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Graphics Device
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:03:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 10.91G (11715084288 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 12337 get requests, put_count=10124 evicted_count=1000 eviction_rate=0.0987752 and unsatisfied allocation rate=0.268542
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into ../model/model.ckpt.
INFO:tensorflow:step = 1, loss = 5.07399
似乎tensorflow試圖佔據GPU的內存(10.91GiB)的總量,但顯然只有10.75GiB可:我使用不同網絡的大小(甚至nmt_small)總是得到下面的錯誤。
我正在使用批量訓練。批量大小是32,減少到16是沒有用的。問題是我的GPU根本無法分配10.91GiB。 – AmirHJ
測試這個,我工作'與tf.Session(config = tf.ConfigProto(allow_soft_placement = True,log_device_placement = True))作爲sess:' –