2017-09-30 72 views
-1

我有兩個問題想要呈現給您。CUDA和C++鏈接/編譯,cudaMalloc上的程序崩潰

I)

我有一個.cpp文件,其中是main(),爲了調用內核(在。cu文件),我用的是extern功能的.cu文件,launch(),它調用內核。這兩個文件分別是.cu.cpp正在編譯成功。從而爲他們結合在一起的,因爲我在CUDA初學者,我想兩件事情:

1)nvcc -Wno-deprecated-gpu-targets -o final file1.cpp file2.cu,成功地使任何錯誤和編譯最終方案和

2)

nvcc -Wno-deprecated-gpu-targets -c file2.cu 
    g++ -c file1.cpp 
    g++ -o program file1.o file2.o -lcudart -lcurand -lcutil -lcudpp -lcuda 

在第二種情況下,由於-l參數未被識別(只有-lcuda),我猜是因爲我沒有指定它們的路徑,因爲我不知道這些文件存儲在哪裏。如果我跳過這些-l參數,錯誤的是:

$ g++ -o final backpropalgorithm_CUDA_kernel_copy.o backpropalgorithm_CUDA_main_copy.o -lcuda 
backpropalgorithm_CUDA_kernel_copy.o: In function `launch': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x185): undefined reference to `cudaConfigureCall' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__cudaUnregisterBinaryUtil()': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x259): undefined reference to `__cudaUnregisterFatBinary' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_init_managed_rt_with_module(void**)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x274): undefined reference to `__cudaInitModule' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__device_stub__Z21neural_network_kernelPfPiS0_PdS1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_S1_(float*, int*, int*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*, double*)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2ac): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2cf): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x2f2): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x315): undefined reference to `cudaSetupArgument' 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x338): undefined reference to `cudaSetupArgument' 
backpropalgorithm_CUDA_kernel_copy.o:tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x35b): more undefined references to `cudaSetupArgument' follow 
backpropalgorithm_CUDA_kernel_copy.o: In function `__nv_cudaEntityRegisterCallback(void**)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x663): undefined reference to `__cudaRegisterFunction' 
backpropalgorithm_CUDA_kernel_copy.o: In function `__sti____cudaRegisterAll_69_tmpxft_0000717b_00000000_7_backpropalgorithm_CUDA_kernel_copy_cpp1_ii_43082cd7()': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x67c): undefined reference to `__cudaRegisterFatBinary' 
backpropalgorithm_CUDA_kernel_copy.o: In function `cudaError cudaLaunch<char>(char*)': 
tmpxft_0000717b_00000000-4_backpropalgorithm_CUDA_kernel_copy.cudafe1.cpp:(.text+0x6c0): undefined reference to `cudaLaunch' 
backpropalgorithm_CUDA_main_copy.o: In function `main': 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x92): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0xf8): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x118): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x12c): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x14c): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x160): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x180): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x194): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1b4): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1c8): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1e8): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x1ff): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x21f): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x236): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x256): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x26a): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x28a): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2a1): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2c1): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2d5): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x2f5): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x309): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x329): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x33d): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x35d): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x371): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x391): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3a5): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3c5): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3dc): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x3fc): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x413): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x433): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x44a): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x46a): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x481): undefined reference to `cudaMalloc' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x4a1): undefined reference to `cudaMemcpy' 
backpropalgorithm_CUDA_main_copy.cpp:(.text+0x5bf): undefined reference to `cudaDeviceSynchronize' 
collect2: error: ld returned 1 exit status 

的事情是,在與「成功」的編譯和鏈接,第一種情況,當我運行程序它顯示爲輸出只有閃爍的光標(在輸入命令的下一行),沒有別的,在控制檯;通常它應該使用CUDA來計算並顯示正在構建的神經網絡的誤差。

II) 我試圖在.cu文件中登錄printf(),但它沒有顯示任何內容。我搜索了一下,發現可能我應該使用cuPrintf()函數。我試過了,但是我遇到了頭文件問題,包含它們沒有定義的文件,儘管我手動包含了它們。我發現我應該包含一個cuPrintf.cu文件,我在網上找到了哪些源代碼。

不幸的是,那麼,當我單獨編譯他們,因爲.cu文件中的錯誤是

ptxas fatal : Unresolved extern function '_Z8cuPrintfIjEiPKcT_' 

.cpp沒有錯誤,但是。

爲什麼會出現所有這些錯誤?錯誤的部分在哪裏?爲什麼程序運行不正常,printf()似乎沒有在內核中工作?爲什麼程序只顯示一個閃爍的光標,而沒有其他的東西? 如果有人能夠啓發我這些問題,我將非常感激,非常感謝您提前!

我的兩個文件的代碼是:

file1.cpp

​​

file.cu

#define w(i,j) w[(i)*(InputN*hn) + (j)] 
#define v(i,j) v[(i)*(hn*OutN) + (j)] 
#define x_out(i,j) x_out[(i)*(InputN) + (j)] 
#define y(i,j) y[(i)*(OutN) + (j)] 
#define hn_out(i,j) hn_out[(i)*(hn) + (j)] 
#define y_out(i,j) y_out[(i)*(OutN) + (j)] 
#define y_delta(i,j) y_delta[(i)*(OutN) + (j)] 
#define hn_delta(i,j) hn_delta[(i)*(hn) + (j)] 
#define deltav(i,j) deltav[(i)*(hn*OutN) + (j)] 
#define deltaw(i,j) deltaw[(i)*(InputN*hn) + (j)] 

#define datanum 4  // number of training samples 
#define InputN 16  // number of neurons in the input layer 
#define hn 64   // number of neurons in the hidden layer 
#define OutN 1   // number of neurons in the output layer 
#define threads_per_block 256 
#define MAX_RAND 100 
#define MIN_RAND 10 

#include <stdio.h> 
#include <math.h> //for truncf() 


// sigmoid serves as avtivation function 
__device__ double sigmoid(double x){ 
    return(1.0/(1.0 + exp(-x))); 
} 


__device__ int rand_kernel(int index, float *randData){ 
    float myrandf = randData[index]; 
    myrandf *= (MAX_RAND - MIN_RAND + 0.999999); 
    myrandf += MIN_RAND; 
    int myrand = (int)truncf(myrandf); 
    return myrand; 
} 


__global__ void neural_network_kernel (float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp) 
{ 
    //int i = blockIdx.x; 
    //int idx = threadIdx.x; 
    //int idy = threadIdx.y 

    int index = blockIdx.x * blockDim.x + threadIdx.x; 

    // training set 
    struct{ 
     double input_kernel[InputN]; 
     double teach_kernel[OutN]; 
    }data_kernel[threads_per_block + datanum]; 

    if (index==0) 
    { 
     for(int m=0; m<datanum; m++){ 
      for(int i=0; i<InputN; i++) 
       data_kernel[threads_per_block + m].input_kernel[i] = (double)rand_kernel(index, randData)/32767.0; 
      for(int i=0;i<OutN;i++) 
       data_kernel[threads_per_block + m].teach_kernel[i] = (double)rand_kernel(index, randData)/32767.0; 
     } 
    } 


    // Initialization 
    for(int i=0; i<InputN; i++){ 
     for(int j=0; j<hn; j++){ 
      w(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1; 
      deltaw(i,j) = 0; 
     } 
    } 
    for(int i=0; i<hn; i++){ 
     for(int j=0; j<OutN; j++){ 
      v(i,j) = ((double)rand_kernel(index, randData)/32767.0)*2-1; 
      deltav(i,j) = 0; 
     } 
    } 


    while(loop[index] < *times){ 
     loop[index]++; 
     error[index] = 0.0; 

     for(int m=0; m<datanum ; m++){ 
      // Feedforward 
      max[index] = 0.0; 
      min[index] = 0.0; 
      for(int i=0; i<InputN; i++){ 
       x_out(index,i) = data_kernel[threads_per_block + m].input_kernel[i]; 
       if(max[index] < x_out(index,i)) 
        max[index] = x_out(index,i); 
       if(min[index] > x_out(index,i)) 
        min[index] = x_out(index,i); 
      } 
      for(int i=0; i<InputN; i++){ 
       x_out(index,i) = (x_out(index,i) - min[index])/(max[index] - min[index]); 
      } 

      for(int i=0; i<OutN ; i++){ 
       y(index,i) = data_kernel[threads_per_block + m].teach_kernel[i]; 
      } 

      for(int i=0; i<hn; i++){ 
       sumtemp[index] = 0.0; 
       for(int j=0; j<InputN; j++) 
        sumtemp[index] += w(j,i) * x_out(index,j); 
       hn_out(index,i) = sigmoid(sumtemp[index]);  // sigmoid serves as the activation function 
      } 

      for(int i=0; i<OutN; i++){ 
       sumtemp[index] = 0.0; 
       for(int j=0; j<hn; j++) 
        sumtemp[index] += v(j,i) * hn_out(index,j); 
       y_out(index,i) = sigmoid(sumtemp[index]); 
      } 

      // Backpropagation 
      for(int i=0; i<OutN; i++){ 
       errtemp[index] = y(index,i) - y_out(index,i); 
       y_delta(index,i) = -errtemp[index] * sigmoid(y_out(index,i)) * (1.0 - sigmoid(y_out(index,i))); 
       error[index] += errtemp[index] * errtemp[index]; 
      } 

      for(int i=0; i<hn; i++){ 
       errtemp[index] = 0.0; 
       for(int j=0; j<OutN; j++) 
        errtemp[index] += y_delta(index,j) * v(i,j); 
       hn_delta(index,i) = errtemp[index] * (1.0 + hn_out(index,i)) * (1.0 - hn_out(index,i)); 
      } 

      // Stochastic gradient descent 
      for(int i=0; i<OutN; i++){ 
       for(int j=0; j<hn; j++){ 
        deltav(j,i) = (*alpha) * deltav(j,i) + (*beta) * y_delta(index,i) * hn_out(index,j); 
        v(j,i) -= deltav(j,i); 
       } 
      } 

      for(int i=0; i<hn; i++){ 
       for(int j=0; j<InputN; j++){ 
        deltaw(j,i) = (*alpha) * deltaw(j,i) + (*beta) * hn_delta(index,i) * x_out(index,j); 
        w(j,i) -= deltaw(j,i); 
       } 
      } 
     } 

     // Global error 
     error[index] = error[index]/2; 
     /*if(loop%1000==0){ 
      result = "Global Error = "; 
      sprintf(buffer, "%f", error); 
      result += buffer; 
      result += "\r\n"; 
     } 
     if(error < errlimit) 
      break;*/ 

     printf("The %d th training, error: %0.100f\n", loop[index], error[index]); 
    } 
} 


extern "C" 
void launch(float *randData, int *times, int *loop, double *error, double *max, double *min, double *x_out, double *hn_out, double *y_out, double *y, double *w, double *v, double *deltaw, double *deltav, double *hn_delta, double *y_delta, double *alpha, double *beta, double *sumtemp, double *errtemp) 
{ 
    int blocks = *times/threads_per_block; 
    neural_network_kernel<<<blocks, threads_per_block>>>(randData, times, loop, error, max, min, x_out, hn_out, y_out, y, w, v, deltaw, deltav, hn_delta, y_delta, alpha, beta, sumtemp, errtemp); 
} 

UPDATE:

我發現關於內存分配的一些錯誤與指針。我更新了上面的代碼...現在的主要問題是:

1)是鏈接/編譯正確,這是我應該怎麼辦呢?我的意思是第一種方式。

2)我發現閃爍光標在第一cudaMalloc()期間立即顯示。在那之前它運行正確。

但在第一cudaMalloc()它掛一輩子,爲什麼呢?

回答

1

之前尋求幫助在這裏,它很好的做法,使用正確的CUDA錯誤檢查與cuda-memcheck運行代碼。如果你不這樣做,你可能會忽略有用的錯誤信息,浪費你的時間以及其他人試圖幫助你。

在第二種情況下,由於-l參數不被識別(僅-lcuda是),我猜是因爲我沒有指定他們的道路,因爲我不知道這些文件的存儲做。

你不想跳過這些。 nvcc會自動鏈接到這些庫,並自動知道在哪裏找到它們。當使用g ++時,你必須告訴它在哪裏看和你需要的特定庫。因爲你們中的代碼,你並不需要所有這些,如果你鏈接庫,所以下面應該是足夠了:對於一個標準的Linux安裝CUDA的

g++ -o program file1.o file2.o -L/usr/local/cuda/lib64 -lcudart 

。如果你沒有一個標準的安裝,你可以做which nvcc,找出nvcc的位置,然後用它來尋找可能的地方,庫位於(改bin的路徑lib64

如果你確實需要一些與其他圖書館,像cutilcudpp將無法​​使用,除非你去特殊的步驟來安裝它們,你需要確定在這種情況下的路徑給他們。

關於cuPrintf,如果您正在編譯並在cc2.0或更新的GPU(無論如何CUDA 8支持的最低計算能力)上運行,則不應該這樣。普通printf應在設備代碼工作,如果不是(因爲你有一個設備代碼錯誤 - 用正確的錯誤檢查和cuda-memcheck)然後cuPrintf將不起作用任何好轉。因此,而不是讓該工作摔跤,只是恢復到使用printf代替代碼(包括stdio.h)。

關於你的程序,爲什麼它不工作,我想你可能有一些錯誤。您可能想要了解如何使用調試器。蝙蝠權利,在主機代碼中,您嘗試從主機代碼初始化randData是非法的。

現在我看到你已經多次改變了這個問題,現在把它變成一個移動的目標,我會停下來。

如果您需要幫助,請停下移動目標。

使用適當的cuda錯誤檢查。