我有兩個問題想要呈現給您。CUDA和C++鏈接/編譯,cudaMalloc上的程序崩潰
I)
我有一個.cpp
文件,其中是main()
,爲了調用內核(在。cu
文件),我用的是extern
功能的.cu
文件,launch()
,它調用內核。這兩個文件分別是.cu
和.cpp
正在編譯成功。從而爲他們結合在一起的,因爲我在CUDA
初學者,我想兩件事情:
1)nvcc -Wno-deprecated-gpu-targets -o final file1.cpp file2.cu
,成功地使任何錯誤和編譯最終方案和
2)
nvcc -Wno-deprecated-gpu-targets -c file2.cu
g++ -c file1.cpp
g++ -o program file1.o file2.o -lcudart -lcurand -lcutil -lcudpp -lcuda
在第二種情況下,由於-l
參數未被識別(只有-lcuda
),我猜是因爲我沒有指定它們的路徑,因爲我不知道這些文件存儲在哪裏。如果我跳過這些-l
參數,錯誤的是:
$ g++ -o final backpropalgorithm_CUDA_kernel_copy.o backpropalgorithm_CUDA_main_copy.o -lcuda
backpropalgorithm_CUDA_kernel_copy.o: In function `launch':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x185): undefined reference to `cudaConfigureCall'
backpropalgorithm_CUDA_kernel_copy.o: In function `__cudaUnregisterBinaryUtil()':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x259): undefined reference to `__cudaUnregisterFatBinary'
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_init_managed_rt_with_module(void**)':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x274): undefined reference to `__cudaInitModule'
backpropalgorithm_CUDA_kernel_copy.o: In function `__device_stub__Z21neural_network_kernelPfPiS0_PdS1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_(float*, int*, int*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*)':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2ac): undefined reference to `cudaSetupArgument'
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2cf): undefined reference to `cudaSetupArgument'
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2f2): undefined reference to `cudaSetupArgument'
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x315): undefined reference to `cudaSetupArgument'
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x338): undefined reference to `cudaSetupArgument'
backpropalgorithm_CUDA_kernel_copy.o:tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x35b): more undefined references to `cudaSetupArgument' follow
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_cudaEntityRegisterCallback(void**)':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x663): undefined reference to `__cudaRegisterFunction'
backpropalgorithm_CUDA_kernel_copy.o: In function `__sti____cudaRegisterAll_69_tmpxft_0000717b_00000000_7_backpropalgorithm_CUDA_kernel_copy_cpp1_ii_43082cd7()':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x67c): undefined reference to `__cudaRegisterFatBinary'
backpropalgorithm_CUDA_kernel_copy.o: In function `cudaError cudaLaunch<char>(char*)':
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x6c0): undefined reference to `cudaLaunch'
backpropalgorithm_CUDA_main_copy.o: In function `main':
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x92): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0xf8): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x118): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x12c): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x14c): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x160): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x180): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x194): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1b4): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1c8): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1e8): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1ff): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x21f): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x236): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x256): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x26a): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x28a): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2a1): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2c1): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2d5): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2f5): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x309): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x329): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x33d): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x35d): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x371): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x391): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3a5): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3c5): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3dc): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3fc): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x413): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x433): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x44a): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x46a): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x481): undefined reference to `cudaMalloc'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x4a1): undefined reference to `cudaMemcpy'
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x5bf): undefined reference to `cudaDeviceSynchronize'
collect2: error: ld returned 1 exit status
的事情是,在與「成功」的編譯和鏈接,第一種情況,當我運行程序它顯示爲輸出只有閃爍的光標(在輸入命令的下一行),沒有別的,在控制檯;通常它應該使用CUDA
來計算並顯示正在構建的神經網絡的誤差。
II) 我試圖在.cu
文件中登錄printf()
,但它沒有顯示任何內容。我搜索了一下,發現可能我應該使用cuPrintf()
函數。我試過了,但是我遇到了頭文件問題,包含它們沒有定義的文件,儘管我手動包含了它們。我發現我應該包含一個cuPrintf.cu文件,我在網上找到了哪些源代碼。
不幸的是,那麼,當我單獨編譯他們,因爲.cu
文件中的錯誤是
ptxas fatal : Unresolved extern function '_Z8cuPrintfIjEiPKcT_'
和.cpp
沒有錯誤,但是。
爲什麼會出現所有這些錯誤?錯誤的部分在哪裏?爲什麼程序運行不正常,printf()
似乎沒有在內核中工作?爲什麼程序只顯示一個閃爍的光標,而沒有其他的東西? 如果有人能夠啓發我這些問題,我將非常感激,非常感謝您提前!
我的兩個文件的代碼是:
file1.cpp
:
file.cu
:
#define w(i,j) w[(i)*(InputN*hn) + (j)]
#define v(i,j) v[(i)*(hn*OutN) + (j)]
#define x_out(i,j) x_out[(i)*(InputN) + (j)]
#define y(i,j) y[(i)*(OutN) + (j)]
#define hn_out(i,j) hn_out[(i)*(hn) + (j)]
#define y_out(i,j) y_out[(i)*(OutN) + (j)]
#define y_delta(i,j) y_delta[(i)*(OutN) + (j)]
#define hn_delta(i,j) hn_delta[(i)*(hn) + (j)]
#define deltav(i,j) deltav[(i)*(hn*OutN) + (j)]
#define deltaw(i,j) deltaw[(i)*(InputN*hn) + (j)]
#define datanum 4 // number of training samples
#define InputN 16 // number of neurons in the input layer
#define hn 64 // number of neurons in the hidden layer
#define OutN 1 // number of neurons in the output layer
#define threads_per_block 256
#define MAX_RAND 100
#define MIN_RAND 10
#include <stdio.h>
#include <math.h> //for truncf()
// sigmoid serves as avtivation function
__device__ double sigmoid(double x){
return(1.0/(1.0 + exp(-x)));
}
__device__ int rand_kernel(int index, float *randData){
float myrandf = randData[index];
myrandf *= (MAX_RAND - MIN_RAND + 0.999999);
myrandf += MIN_RAND;
int myrand = (int)truncf(myrandf);
return myrand;
}
__global__ void neural_network_kernel (float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp)
{
//int i = blockIdx.x;
//int idx = threadIdx.x;
//int idy = threadIdx.y
int index = blockIdx.x * blockDim.x + threadIdx.x;
// training set
struct{
double input_kernel[InputN];
double teach_kernel[OutN];
}data_kernel[threads_per_block + datanum];
if (index==0)
{
for(int m=0; m<datanum; m++){
for(int i=0; i<InputN; i++)
data_kernel[threads_per_block + m].input_kernel[i] = (double)rand_kernel(index, randData)/32767.0;
for(int i=0;i<OutN;i++)
data_kernel[threads_per_block + m].teach_kernel[i] = (double)rand_kernel(index, randData)/32767.0;
}
}
// Initialization
for(int i=0; i<InputN; i++){
for(int j=0; j<hn; j++){
w(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1;
deltaw(i,j) = 0;
}
}
for(int i=0; i<hn; i++){
for(int j=0; j<OutN; j++){
v(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1;
deltav(i,j) = 0;
}
}
while(loop[index] < *times){
loop[index]++;
error[index] = 0.0;
for(int m=0; m<datanum ; m++){
// Feedforward
max[index] = 0.0;
min[index] = 0.0;
for(int i=0; i<InputN; i++){
x_out(index,i) = data_kernel[threads_per_block + m].input_kernel[i];
if(max[index] < x_out(index,i))
max[index] = x_out(index,i);
if(min[index] > x_out(index,i))
min[index] = x_out(index,i);
}
for(int i=0; i<InputN; i++){
x_out(index,i) = (x_out(index,i) - min[index])/(max[index] - min[index]);
}
for(int i=0; i<OutN ; i++){
y(index,i) = data_kernel[threads_per_block + m].teach_kernel[i];
}
for(int i=0; i<hn; i++){
sumtemp[index] = 0.0;
for(int j=0; j<InputN; j++)
sumtemp[index] += w(j,i) * x_out(index,j);
hn_out(index,i) = sigmoid(sumtemp[index]); // sigmoid serves as the activation function
}
for(int i=0; i<OutN; i++){
sumtemp[index] = 0.0;
for(int j=0; j<hn; j++)
sumtemp[index] += v(j,i) * hn_out(index,j);
y_out(index,i) = sigmoid(sumtemp[index]);
}
// Backpropagation
for(int i=0; i<OutN; i++){
errtemp[index] = y(index,i) - y_out(index,i);
y_delta(index,i) = -errtemp[index] * sigmoid(y_out(index,i)) * (1.0 - sigmoid(y_out(index,i)));
error[index] += errtemp[index] * errtemp[index];
}
for(int i=0; i<hn; i++){
errtemp[index] = 0.0;
for(int j=0; j<OutN; j++)
errtemp[index] += y_delta(index,j) * v(i,j);
hn_delta(index,i) = errtemp[index] * (1.0 + hn_out(index,i)) * (1.0 - hn_out(index,i));
}
// Stochastic gradient descent
for(int i=0; i<OutN; i++){
for(int j=0; j<hn; j++){
deltav(j,i) = (*alpha) * deltav(j,i) + (*beta) * y_delta(index,i) * hn_out(index,j);
v(j,i) -= deltav(j,i);
}
}
for(int i=0; i<hn; i++){
for(int j=0; j<InputN; j++){
deltaw(j,i) = (*alpha) * deltaw(j,i) + (*beta) * hn_delta(index,i) * x_out(index,j);
w(j,i) -= deltaw(j,i);
}
}
}
// Global error
error[index] = error[index]/2;
/*if(loop%1000==0){
result = "Global Error = ";
sprintf(buffer, "%f", error);
result += buffer;
result += "\r\n";
}
if(error < errlimit)
break;*/
printf("The %d th training, error: %0.100f\n", loop[index], error[index]);
}
}
extern "C"
void launch(float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp)
{
int blocks = *times/threads_per_block;
neural_network_kernel<<<blocks, threads_per_block>>>(randData, times, loop, error, max, min, x_out, hn_out, y_out, y, w, v, deltaw, deltav, hn_delta, y_delta, alpha, beta, sumtemp, errtemp);
}
UPDATE:
我發現關於內存分配的一些錯誤與指針。我更新了上面的代碼...現在的主要問題是:
1)是鏈接/編譯正確,這是我應該怎麼辦呢?我的意思是第一種方式。
2)我發現閃爍光標在第一cudaMalloc()
期間立即顯示。在那之前它運行正確。
但在第一cudaMalloc()
它掛一輩子,爲什麼呢?