CUDA運行時gpu初始化與theano

我想並行我的神經網絡跨兩個GPU後https://github.com/uoguelph-mlrg/theano_multi_gpu。我有所有的依賴關係，但cuda運行時初始化失敗並顯示以下消息。CUDA運行時gpu初始化與theano

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed: 
cublasCreate() returned this error 'the CUDA Runtime initialization failed' 
Error when trying to find the memory information on the GPU: invalid device ordinal 
Error allocating 24 bytes of device memory (invalid device ordinal). Driver report 0 bytes free and 0 bytes total 
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed: 
CudaNdarray_ZEROS: allocation failed. 
Process Process-1: 
Traceback (most recent call last): 
    File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap 
    self.run() 
    File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run 
    self._target(*self._args, **self._kwargs) 
    File "/u/bsankara/nt/Git-nt/nt/train_attention.py", line 171, in launch_train 
    clip_c=1.) 
    File "/u/bsankara/nt/Git-nt/nt/nt.py", line 1616, in train 
    import theano.sandbox.cuda 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/__init__.py", line 98, in <module> 
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1() 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 30, in test_nvidia_driver1 
    A = cuda.shared_constructor(a) 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 181, in float32_shared_constructor 
    enable_cuda=False) 
    File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 389, in use 
    cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((2, 3)) 
RuntimeError: ('CudaNdarray_ZEROS: allocation failed.', 'You asked to force this device and it failed. No fallback to the cpu or other gpu device.')

的代碼段的相關部分是在這裏：當進口theano.sandbox.cuda被觸發

from multiprocessing import Queue 
import zmq 
import pycuda.driver as drv 
import pycuda.gpuarray as gpuarray 

def train(private_args, process_env, <some other args>) 
    if process_env is not None: 
     os.environ = process_env 

    #### 
    # pycuda and zmq environment 

    drv.init() 
    dev = drv.Device(private_args['ind_gpu']) 
    ctx = dev.make_context() 
    sock = zmq.Context().socket(zmq.PAIR) 

    if private_args['flag_client']: 
     sock.connect('tcp://localhost:5000') 
    else: 
     sock.bind('tcp://*:5000') 

    #### 
    # import theano stuffs 
    import theano.sandbox.cuda 
    theano.sandbox.cuda.use(private_args['gpu']) 

    import theano 
    import theano.tensor as tensor 
    from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams 
    import theano.misc.pycuda_init 
    import theano.misc.pycuda_utils 
...

錯誤。在這裏，我將訓練功能作爲兩個過程來發揮作用。

def launch_train(curr_args, process_env, curr_queue, oth_queue): 
    trainerr, validerr, testerr = train(private_args=curr_args, 
             process_env=process_env, 
             ...) 

process1_env = os.environ.copy() 
process1_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu0,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU1" 
process2_env = os.environ.copy() 
process2_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu1,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU2" 

p = Process(target=launch_train, 
       args=(p_args, process1_env, queue_p, queue_q)) 
q = Process(target=launch_train, 
       args=(q_args, process2_env, queue_q, queue_p)) 

p.start() 
q.start() 
p.join() 
q.join()

但是，如果我嘗試在Python中交互式地初始化gpu，導入語句似乎工作。我執行了火車的前20行（），它在那裏工作得很好，並按我的要求正確地將我分配給了gpu0。

來源

2015-09-24 baskaran

我試着用pdb進行一些調試，它似乎在/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py文件中失敗 'def use（device，force = False，default_to_move_computation_to_gpu = True，move_shared_float32_to_gpu = True，enable_cuda = True，test_driver = True）：' 特別是，它在命令'gpu_init（device）'中崩潰。 'device'具有'0'值，來自'gpu0'，並且失敗並且消息： RuntimeError：「cublasCreate（）返回了此錯誤'CUDA運行時初始化失敗'」 – baskaran

'dual_mlp.py'代碼（在你鏈接到的GitHub倉庫中）不用修改就運行？您是否嘗試回到關於此主題的原始/官方文檔（https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs）？ –

@Daniel，官方文檔和dual_mlp.py人使用相同的方法。他們都啓動子進程，然後導入'theano.sandbox.cuda'與gpu進行綁定。 AFAIK的唯一區別是dual_mlp.py使用PyCUDA函數進行GPU到GPU的傳輸，以避免通過主機內存進行隧道傳輸的延遲。官方文檔，建議使用多處理隊列。我沒有嘗試自己運行dual_mlp.py，但與其中一位作者進行了私人交流，他表示它對他們有效。會檢查這一點。 – baskaran

挖掘並運行pdb後，原始海報發現問題。

基本上theano和pycuda都爭奪初始化GPU，導致問題。解決方案是首先「導入theano」，這將得到一個GPU，然後附加到pycuda中的特定context。所以，train函數內進口的部分是這樣的：

def train(private_args, process_env, <some other args>) 
    if process_env is not None: 
     os.environ = process_env 

    #### 
    # import theano related 
    # We need global imports and so we make them as such 
    theano = __import__('theano') 
    _t_tensor = __import__('theano', globals(), locals(), ['tensor'], -1) 
    tensor = _t_tensor.tensor 

    import theano.sandbox.cuda 
    import theano.misc.pycuda_utils 

    #### 
    # pycuda and zmq environment 
    import zmq 
    import pycuda.driver as drv 
    import pycuda.gpuarray as gpuarray 

    drv.init() 
    # Attach the existing context (already initialized by theano import statement) 
    ctx = drv.Context.attach() 
    sock = zmq.Context().socket(zmq.PAIR) 

    if private_args['flag_client']: 
     sock.connect('tcp://localhost:5000') 
    else: 
     sock.bind('tcp://*:5000')

[這個答案加入從由OP在試圖讓這個問題關閉unaswered列表中進行編輯社區維基條目。

來源

2016-06-23 09:52:09 talonmies

CUDA運行時gpu初始化與theano

回答

相關問題