cuda代碼不進入內核

我想學習cuda。我想運行一個簡單的代碼cuda代碼不進入內核

#include <stdlib.h> 
#include <stdio.h> 

__global__ void kernel(int *array) 
{ 
int index = blockIdx.x * blockDim.x + threadIdx.x; 

    array[index] = 7; 
} 

int main(void) 
{ 
    int num_elements = 256; 

    int num_bytes = num_elements * sizeof(int); 

    // pointers to host & device arrays 
    int *device_array = 0; 
    int *host_array = 0; 

    // malloc a host array 
    host_array = (int*)malloc(num_bytes); 

    // cudaMalloc a device array 
    cudaMalloc((void**)&device_array, num_bytes); 

    int block_size = 128; 
    int grid_size = num_elements/block_size; 

    kernel<<<grid_size,block_size>>>(device_array); 

    // download and inspect the result on the host: 
    cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost); 

    // print out the result element by element 
    for(int i=0; i < num_elements; ++i) 
    { 
    printf("%d ", host_array[i]); 
    } 

    // deallocate memory 
    free(host_array); 
    cudaFree(device_array); 
}

它應該打印7的，但它打印0的這種說法似乎並沒有得到執行「內核< < >>（device_array）;」它也沒有給出任何編譯錯誤。任何幫助？

來源

2013-01-17 user1986573

我看不出有什麼明顯的錯誤與您的代碼（從缺乏錯誤檢查，相隔你應該進入加儘快的習慣）。 CUDA SDK示例在您的系統上運行正常嗎？ –

沒有明顯的失敗原因。向CUDA調用添加錯誤檢查。內核調用後添加'cudaDeviceSynchronize'並檢查返回的錯誤代碼。 – sgarizvi

您安裝了哪種圖形設備，CUDA工具包和驅動程序版本？檢查錯誤絕不是浪費時間。 – pQB

代碼在我的機器上運行良好，但請確保在內核調用後添加cudaDeviceSynchronize和錯誤檢查。

變化如下檢查錯誤代碼：

kernel<<<grid_size,block_size>>>(device_array); 
// wait until tasks are completed 
cudaDeviceSynchronize(); 

// check for errors 
cudaError_t error = cudaGetLastError(); 
if (error != cudaSuccess) { 
    fprintf(stderr, "ERROR: %s \n", cudaGetErrorString(error)); 
}

來源

2013-01-19 00:12:49 ipa

cuda代碼不進入內核

回答

相關問題