2017-06-05 29 views
-1

我一直在使用tensorflow近兩年來,還從來沒見過這樣的。在一個新的Ubuntu盒子上,我在virtualenv中安裝了tensorflow。當我運行示例代碼時,出現無效設備錯誤。它發生在調用tf.Session()時。tensorflow不尋常的CUDA相關的錯誤

WARNING:tensorflow:From full_code.py:27: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. 
Instructions for updating: 
Use `tf.global_variables_initializer` instead. 
2017-06-05 11:01:55.853842: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 
2017-06-05 11:01:55.853867: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-06-05 11:01:55.853876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 
2017-06-05 11:01:55.853886: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 
2017-06-05 11:01:55.853893: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 
2017-06-05 11:01:55.937978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 660 Ti 
major: 3 minor: 0 memoryClockRate (GHz) 1.0455 
pciBusID 0000:04:00.0 
Total memory: 2.95GiB 
Free memory: 2.91GiB 
2017-06-05 11:01:55.938063: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x19e5370 
2017-06-05 11:01:56.014220: E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE 

下面是完整的規範。

Ubuntu 14.04 
CUDA 8.0 
GeForce GTX 660 Ti 
python 3.4.3 
+0

你驗證CUDA安裝? –

+0

@RobertCrovella不知道如何? – horaceT

+0

檢查CUDA Linux安裝指南 –

回答

1

感謝來自谷歌的人,我想通了哪裏出了問題。在這個戴爾盒子裏,有兩個Nvidia顯卡。第一個與製造商一起,是一個NVS 310卡。據我所知,這個沒有任何計算能力,我從來沒有打算大量使用它。

我然後加入第二卡,GTX 660 Ti和我打算用這一個爲所有計算。

當Tensorflow被調用時,它默認爲設備0,這是NVS 310當然它拋出一個無效的錯誤。

當我這樣做,

CUDA_VISIBLE_DEVICES = 1條蟒蛇myscript.py

它的工作原理。

+0

因此,解決方案涉及硬件細節,你完全忽略在你的問題中提及? – talonmies

+0

@talonmies完全是我的不好。在有多個GPU的情況下,我對CUDA的行爲有了更多的發現。 – horaceT