2012-01-19 60 views
2

爲了更好地處理主機和設備上的內存,我創建了以下類。 理論上它應該管理從主機到設備的複製,反之亦然。在類中包裝cuda內存處理導致內存地址損壞

struct CudaArray 
{ 

int* memoryHost; 
int* memoryDevice; 

int size; 

CudaArray(int datasize) // creates array on host and allocates memory on device with cudaMalloc 
{ 
    size = datasize; 
    memoryHost = new int[size]; 

    for (int i = 0; i < size; i++) 
    { 
     memoryHost[i] = 0; 
    } 

    cudaMalloc((void**)&memoryDevice, sizeof(int) * size); 
} 

~CudaArray() // frees memory on device and host 
{ 
    delete[] memoryHost; 
    cudaFree(memoryDevice); 
} 

void Upload() // upload data from host to device 
{ 
    cudaMemcpy(memoryDevice, memoryHost, sizeof(int) * size, cudaMemcpyHostToDevice); 
} 
void Download() // download data from device to host 
{ 
    cudaMemcpy(memoryHost, memoryDevice, sizeof(int) * size, cudaMemcpyDeviceToHost); 
} 

void Insert(int* src); // copy from src to memoryHost 
void Retrieve(int* dest); // copy from memoryHost to dest 
}; 

在內部,一切都很好。但是,當我用我的CudaArray的對象,有問題與指針:

CudaArray cuda_ar(1000); 
kernel <<<blocks, threads_per_block>>> (cuda_ar.memoryDevice, cuda_ar.size); 

通過使用調試器,我設法讀指針存儲器器件。在結構體內部(例如,當通過Upload()時,它是0x01000000。但是在內核被執行的地方,memoryDe​​vice指向0x00000400(數字是例子)

我知道memoryDe​​vice是一個指向內存的指針在設備上。 有沒有辦法來解釋這種現象,並給出一個解決我的問題?

回答

1

當我運行下面的程序

#include <cstdio> 
struct CudaArray 
{ 

    int* memoryHost; 
    int* memoryDevice; 

    int size; 

    CudaArray(int datasize) // creates array on host and allocates memory on device with cudaMalloc 
    { 
    size = datasize; 
    memoryHost = new int[size]; 

    for (int i = 0; i < size; i++) 
    { 
     memoryHost[i] = 0; 
    } 

    cudaMalloc((void**)&memoryDevice, sizeof(int) * size); 
} 

~CudaArray() // frees memory on device and host 
{ 
    delete[] memoryHost; 
    cudaFree(memoryDevice); 
} 

void Upload() // upload data from host to device 
{ 
    cudaMemcpy(memoryDevice, memoryHost, sizeof(int) * size, cudaMemcpyHostToDevice); 
} 
void Download() // download data from device to host 
{ 
    cudaMemcpy(memoryHost, memoryDevice, sizeof(int) * size, cudaMemcpyDeviceToHost); 
} 

}; 

__global__ void kernel(int *ptr, int n) 
{ 
    printf("On Device : %p %d\n", ptr, n); 
} 

int main(void) 
{ 
    CudaArray cuda_ar(1000); 
    printf("On Host : %p %d\n", cuda_ar.memoryDevice, cuda_ar.size); 
    kernel<<<1, 1>>>(cuda_ar.memoryDevice, cuda_ar.size); 
    return 0; 
} 

我得到

On Host : 0x200400000 1000 
On Device : 0x200400000 1000 

您應該確保您的CUDA運行時調用如cudaMalloc,cudaMemcpy和內核啓動都已成功返回。您可以在所有CUDA運行時調用後嘗試以下代碼以驗證:

if (cudaSuccess != cudaGetLastError()) 
    printf("Error!\n"); 
+0

嗯,感謝提示cudaGetLastError()。由於Visual Studio項目中存在一些奇怪的值,因此cuda文件使用sm_13編譯,而我只有sm_11。現在來自上面的代碼工作!謝謝! – Martin