在對某個函數「壞」調用後，CUDA不能再將數據從設備複製到主機

我正在測試一個代碼，其中內核旨在執行存儲在兩個指針中的兩個值之間的簡單求和。在對某個函數「壞」調用後，CUDA不能再將數據從設備複製到主機

在調用內核「add」之後，即使沒有對內核中的指針執行任何操作，我也不能再將指針的數據從主機複製到設備，然後再從主機複製到主機。但是當我評論函數被調用的語句時，我得到了正確的結果。下面是代碼：

#include <stdio.h> 
#include <stdlib.h> 
#include <cuda_runtime.h> 

__global__ void add(int *a, int *b, int *c) 
{ 
*c = *a - *b; 
} 

int main(void) 
{ 
int result, x_val, y_val; //Store data from device to host in this vars. 
int *x_host, *y_host; //Pointers in host 
int *tempGPU, *x_dev, *y_dev; //Pointers in device 

x_host = (int *)malloc(sizeof(int)); 
y_host = (int *)malloc(sizeof(int)); 

*x_host = 8; 
*y_host = 4; 

x_val = -5; 
y_val = -10; 

printf("\n x = %d, y = %d\n", *x_host, *y_host); 

cudaMalloc((void **)&tempGPU, sizeof(int)); 

//It's wrong to pass this arguments to the function. The problem is in this statement. 
add<<<1,1>>> (x_host, y_host, tempGPU); 

cudaMemcpy(&result, tempGPU, sizeof(int), cudaMemcpyDeviceToHost); 

printf("\n x_host - y_host = %d\n", result); 

cudaMalloc((void **)&x_dev, sizeof(int)); 
cudaMalloc((void **)&y_dev, sizeof(int)); 

*x_host = 6; 
*y_host = 20; 

cudaMemcpy(x_dev, x_host, sizeof(int), cudaMemcpyHostToDevice); 
cudaMemcpy(y_dev, y_host, sizeof(int), cudaMemcpyHostToDevice); 

cudaMemcpy(&x_val, x_dev, sizeof(int), cudaMemcpyDeviceToHost); 
cudaMemcpy(&y_val, y_dev, sizeof(int), cudaMemcpyDeviceToHost); 

printf("\n x_host = %d, y_host = %d\n", *x_host, *y_host); 
printf("\n x_val = %d, y_val = %d\n", x_val, y_val); 

cudaFree(tempGPU); 

printf("\nCUDA: %s\n", cudaGetErrorString(cudaGetLastError())); 

return 0; 

}

我知道函數需要指針在設備分配的，但是爲什麼這樣的錯誤不允許我正確地使用cudaMemcpy？爲什麼當我評論該行時：

add<<<1,1>>> (x_host, y_host, tempGPU);

我得到正確的結果。謝謝。

來源

2014-02-08 LeonelG

你的問題是'x_host'和'y_host'是指向主機內存空間的指針。 '__global__ add'函數需要指向設備內存空間的指針。當你構建你的代碼時，add會錯誤地將'x_host'和'y_host'解釋爲設備內存指針。 – JackOLantern

你不檢查錯誤。而這種不正確的推理起源於那裏。你的'add'內核運行不正常，但是這個節目繼續下去，因爲直到'cudaMemcpy'都沒有捕獲到錯誤。請查看[this]（http://stackoverflow.com/q/14038589/2386951）。 – Farzad

您的問題是x_host和y_host是指向主機內存空間的指針。 __global__ add函數需要指向設備內存空間的指針。由於您構建了代碼，因此add將錯誤地將x_host和y_host解釋爲設備內存指針。

正如Farzad所注意到的那樣，您可能會通過在What is the canonical way to check for errors using the CUDA runtime API?意義上的恰當的CUDA錯誤檢查自己發現錯誤。

以下是用適當的CUDA錯誤檢查修復的代碼。

#include <stdio.h> 
#include <stdlib.h> 
#include <cuda_runtime.h> 

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); } 
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true) 
{ 
    if (code != cudaSuccess) 
    { 
     fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line); 
     if (abort) { exit(code); getchar(); } 
    } 
} 

__global__ void add(int *a, int *b, int *c) 
{ 
    *c = *a - *b; 
} 

int main(void) 
{ 
    int* x_host = (int*)malloc(sizeof(int)); 
    int* y_host = (int*)malloc(sizeof(int)); 

    *x_host = 8; 
    *y_host = 4; 

    int* tempGPU; gpuErrchk(cudaMalloc((void**)&tempGPU,sizeof(int))); 
    int* x_dev;  gpuErrchk(cudaMalloc((void**)&x_dev, sizeof(int))); 
    int* y_dev;  gpuErrchk(cudaMalloc((void**)&y_dev, sizeof(int))); 

    gpuErrchk(cudaMemcpy(x_dev, x_host, sizeof(int), cudaMemcpyHostToDevice)); 
    gpuErrchk(cudaMemcpy(y_dev, y_host, sizeof(int), cudaMemcpyHostToDevice)); 

    int result; 

    add<<<1,1>>> (x_dev, y_dev, tempGPU); 
    gpuErrchk(cudaPeekAtLastError()); 
    gpuErrchk(cudaDeviceSynchronize()); 

    gpuErrchk(cudaMemcpy(&result, tempGPU, sizeof(int), cudaMemcpyDeviceToHost)); 

    printf("\n x_host - y_host = %d\n", result); 

    gpuErrchk(cudaFree(x_dev)); 
    gpuErrchk(cudaFree(y_dev)); 
    gpuErrchk(cudaFree(tempGPU)); 

    getchar(); 

    return 0; 

}

來源

2014-02-08 08:12:52 JackOLantern

是的，我故意用指向主機內存空間的「add」來調用，比方說，看看會發生什麼。所以，**沒有檢查CUDA錯誤**，CUDA運行時API中的錯誤將導致「禁用」或「損壞」，可以說，在隨後調用CUDA的函數中？這就是爲什麼在我的代碼示例中，'cudaMemcpy'給了我錯誤的結果？ – LeonelG

@LeonelG你的代碼中有'add'函數的錯誤參數會被卡住，並且之後在我的系統上不執行'cudaMemcpy'。 – JackOLantern

在對某個函數「壞」調用後，CUDA不能再將數據從設備複製到主機

回答

相關問題