2013-05-08 28 views
2

作爲encog安裝測試的一部分,我嘗試運行 ./encog benchmark /gpu:0,這工作正常,但是當我嘗試 ./encog benchmark /gpu:1,我得到:如何在運行時修復encog「內核啓動失敗」錯誤:「./encog benchmark/gpu:1」

encog-core/cuda_eval.cu(286) : getLastCudaError() CUDA error : kernel launch failure : (13) invalid device symbol. 

我在Ubuntu 11.10,我從https://github.com/encog/encog-c, 得到源代碼,並 「使ARCH = 64 CUDA = 1」 就沒有錯誤了。

感謝您解決此問題的任何幫助。

下面是基準控制檯列表中工作得很好:

[email protected]:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:0 

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * * 
Copyright 2012 by Heaton Research, Released under the Apache License 
Build Date: May 4 2013 07:24:00 
Processor/Core Count: 32 
Basic Data Type: double (64 bits) 
GPU: disabled 
Input Count: 10 
Ideal Count: 1 
Records: 10000 
Iterations: 100 

Performing benchmark...please wait 
Benchmark time(seconds): 3.2856 
Benchmark time includes only training time. 

Encog Finished. Run time 00:00:03.2904 

============================ =================

這裏的基準測試是有問題

[email protected]:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:1 

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * * 
Copyright 2012 by Heaton Research, Released under the Apache License 
Build Date: May 4 2013 07:24:00 
Processor/Core Count: 32 
Basic Data Type: double (64 bits) 
GPU: enabled 
Input Count: 10 
Ideal Count: 1 
Records: 10000 
Iterations: 100 

Performing benchmark...please wait 
encog-core/cuda_eval.cu(286) : getLastCudaError() CUDA error : kernel launch failure : (13) invalid device symbol. 

============== ============================

這是我的GPU的環境是這樣的:

[email protected]:~/a01-neuralnet-encog/encog-c-master$ ./encog cuda 

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * * 
Copyright 2012 by Heaton Research, Released under the Apache License 
Build Date: May 4 2013 07:24:00 
Processor/Core Count: 32 
Basic Data Type: double (64 bits) 
GPU: enabled 
Device 0: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 1: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 2: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 3: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 4: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 5: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 6: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Device 7: GeForce GTX 690 
CUDA Driver Version/Runtime Version 5.0/5.0 
CUDA Capability Major/Minor version number: 3.0 
Total amount of global memory: 2048 MBytes (2147287040 bytes) 

(8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores 
GPU Clock Speed: 1.02 GHz 
Total amount of constant memory: 65536 bytes 
Total amount of shared memory per block: 49152 bytes 
Total number of registers available per block: 65536 
Warp size: 32 
Maximum number of threads per block: 1024 
Maximum sizes of each dimension of a block: 1024 x 1024 x 64 
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 
Maximum memory pitch: 2147483647 bytes 
Texture alignment: 512 bytes 
Performing CUDA test. 
Vector Addition 
CUDA Vector Add Test was successful. 
Encog Finished. Run time 00:00:10.9206 

===============================

這裏的輸出我的「製造」:

[email protected]:~/a01-neuralnet-encog/encog-c-master$ make ARCH=64 CUDA=1 
mkdir -p ./obj-cmd 
gcc -c -o obj-cmd/encog-cmd.o encog-cmd/encog-cmd.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-cmd 
gcc -c -o obj-cmd/cuda_test.o encog-cmd/cuda_test.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-cmd 
gcc -c -o obj-cmd/node_unix.o encog-cmd/node_unix.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-cmd 
/usr/local/cuda/bin/nvcc -o obj-cmd/cuda_vecadd.cu.o -c encog-cmd/cuda_vecadd.cu -I./encog-core/ -m64 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/activation.o encog-core/activation.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/errorcalc.o encog-core/errorcalc.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/network_io.o encog-core/network_io.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/util.o encog-core/util.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/util_str.o encog-core/util_str.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/data.o encog-core/data.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/errors.o encog-core/errors.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/network.o encog-core/network.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/pso.o encog-core/pso.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/util_file.o encog-core/util_file.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/vector.o encog-core/vector.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/encog.o encog-core/encog.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/nm.o encog-core/nm.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/object.o encog-core/object.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/rprop.o encog-core/rprop.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/hash.o encog-core/hash.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
gcc -c -o obj-lib/train.o encog-core/train.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include 
mkdir -p ./obj-lib 
/usr/local/cuda/bin/nvcc -o obj-lib/encog_cuda.cu.o -c encog-core/encog_cuda.cu -I./encog-core/ -m64 
mkdir -p ./obj-lib 
/usr/local/cuda/bin/nvcc -o obj-lib/cuda_eval.cu.o -c encog-core/cuda_eval.cu -I./encog-core/ -m64 
ptxas /tmp/tmpxft_00001b04_00000000-5_cuda_eval.ptx, line 141; warning : Double is not supported. Demoting to float 
mkdir -p ./lib 
ar rcs ./lib/encog.a ./obj-lib/activation.o ./obj-lib/errorcalc.o ./obj-lib/network_io.o ./obj-lib/util.o ./obj-lib/util_str.o ./obj-lib/data.o ./obj-lib/errors.o ./obj-lib/network.o ./obj-lib/pso.o ./obj-lib/util_file.o ./obj-lib/vector.o ./obj-lib/encog.o ./obj-lib/nm.o ./obj-lib/object.o ./obj-lib/rprop.o ./obj-lib/hash.o ./obj-lib/train.o ./obj-lib/encog_cuda.cu.o ./obj-lib/cuda_eval.cu.o 
gcc -o encog obj-cmd/encog-cmd.o obj-cmd/cuda_test.o obj-cmd/node_unix.o obj-cmd/cuda_vecadd.cu.o lib/encog.a -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include -lm ./lib/encog.a -L/usr/local/cuda/lib64 -lcudart 
[email protected]:~/a01-neuralnet-encog/encog-c-master$ 

回答

1

我試着在我的GeForce 580上運行這個,沒有問題。我和你是不同的平臺,因爲你是6系列。我在Google的幾個地方查找了錯誤。它看起來可能是本地內存使用方式的問題,可能不適用於6系列。可能要在此處提交問題:

https://github.com/encog/encog-c/issues