TensorFlow只適用於GPU 0

我很難在GPU 1中運行tensorflow程序。無論我使用CUDA_VISIBLE_DEVICES=1 python program.py還是在程序中使用tf.device('/gpu:1')，我一直都可以使用以下錯誤：TensorFlow只適用於GPU 0

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla K40c 
major: 3 minor: 5 memoryClockRate (GHz) 0.745 
pciBusID 0000:04:00.0 
Total memory: 12.00GiB 
Free memory: 11.90GiB 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:04:00.0) 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00GiB 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 11.31GiB bytes. 
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x9047a0000 extends to 0xbd8325334 
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) 
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) 
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) 
Aborted

與GPU0運行時，不會發生這種情況，但問題是，別人是使用GPU和圖形處理單元GPU1閒置

來源

2016-03-11 crscardellino

我猜測在某處張量流代碼庫中有一些硬編碼的設備0引用。 – talonmies

@talonmies我通常使用不同的GPU和tensorflow（使用CUDA_VISIBLE_DEVICES來選擇GPU），所以我非常確定這種方式一般可以工作 – etarion

@etarion：工作*一般*並且在這個特定情況下工作的是兩個不同的事情：顯然，內部上下文處理例程正在炸燬 - 錯誤216是「CUDA_ERROR_CONTEXT_ALREADY_IN_USE」 – talonmies

如果您正在使用CUDA_VISIBLE_DEVICES = 1確保「別人「正在運行CUDA_VISIBLE_DEVICES = 0。

默認情況下，如果未指定CUDA_VISIBLE_DEVICES，則張量流程會佔用所有GPU，即使它並未主動使用它們。它們可能看起來空閒，但不會被後續張量流程訪問。但是，您可以在每個進程上使用CUDA_VISIBLE_DEVICES爲每個GPU分配不同的GPU。請記住，在每個過程中，tensorflow設備/ gpu：0，/ gpu：1等都是相對於過程而言的，而不是全局的......因此，他們會參考任何cuda可見設備可用於該進程......換句話說，如果您希望每個進程使用1個GPU，則在使用CUDA_VISIBLE_DEVICES將特定GPU分配給每個進程時，可以在代碼中引用/ gpu：0。

使用命令nvidia-smi查看哪些進程正在使用哪些GPU。

來源

2016-03-12 14:55:37 j314erre

TensorFlow只適用於GPU 0

回答

相關問題