2010-03-19 63 views
3

我剛開始學習如何使用CUDA。我想運行一些簡單的示例代碼:當我在仿真模式下運行CUDA:cudaMemcpy只能在仿真模式下工作


float *ah, *bh, *ad, *bd; 
ah = (float *)malloc(sizeof(float)*4); 
bh = (float *)malloc(sizeof(float)*4); 
cudaMalloc((void **) &ad, sizeof(float)*4); 
cudaMalloc((void **) &bd, sizeof(float)*4); 
... initialize ah ... 

/* copy array on device */ 
cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice); 
cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice); 
cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost); 

(NVCC -deviceemu)運行良好(和實際拷貝陣列)。 但是,當我在常規模式下運行它時,它運行沒有錯誤,但從不復制數據。就好像cudaMemcpy行被忽略。

我在做什麼錯?

非常感謝你, 傑森

+0

糟糕。這似乎是與cudaMalloc()問題。它沒有在設備上分配內存。這是爲什麼? – Jason 2010-03-19 19:01:17

+0

你初始化了設備嗎? 使用cuda獲取上次錯誤以打印狀態 – Anycorn 2010-03-19 20:34:04

+1

@aaa:使用運行時API(以cuda而不是cu爲前綴的函數)意味着您不需要明確初始化設備,它將在第一次cuda調用時附加到第一個兼容設備。 – Tom 2010-03-20 17:24:05

回答

3

你應該檢查錯誤,最好每個malloc和memcpy的,但只是做一次,在年底就足夠了(cudaGetErrorString(cudaGetLastError())

只是爲了檢查明顯:

  • 你有一個CUDA的GPU,右運行deviceQuery SDK樣本,以檢查設備是否正常工作,並安裝所有的驅動程序和工作
  • N(在memcpy中)等於4(在malloc中),對不對?
1

查看您是否擁有支持CUDA的設備。可能您可以嘗試運行下面的代碼並查看您獲得的信息:

#include <cstdio> 

int main(void) { 
    cudaDeviceProp prop; 

    int count; 
    cudaGetDeviceCount(&count); 
    for (int i=0; i< count; i++) { 
     cudaGetDeviceProperties(&prop, i); 
     printf(" --- General Information for device %d ---\n", i); 
     printf("Name: %s\n", prop.name); 
     printf("Compute capability: %d.%d\n", prop.major, prop.minor); 
     printf("Clock rate: %d\n", prop.clockRate); 
     printf("Device copy overlap: "); 
     if (prop.deviceOverlap) 
      printf("Enabled\n"); 
     else 
      printf("Disabled\n"); 
     printf("Kernel execution timeout : "); 
     if (prop.kernelExecTimeoutEnabled) 
      printf("Enabled\n"); 
     else 
      printf("Disabled\n"); 

     printf(" --- Memory Information for device %d ---\n", i); 
     printf("Total global mem: %ld\n", prop.totalGlobalMem); 
     printf("Total constant Mem: %ld\n", prop.totalConstMem); 
     printf("Max mem pitch: %ld\n", prop.memPitch); 
     printf("Texture Alignment: %ld\n", prop.textureAlignment); 

     printf(" --- MP Information for device %d ---\n", i); 
     printf("Multiprocessor count: %d\n", 
        prop.multiProcessorCount); 
     printf("Shared mem per mp: %ld\n", prop.sharedMemPerBlock); 
     printf("Registers per mp: %d\n", prop.regsPerBlock); 
     printf("Threads in warp: %d\n", prop.warpSize); 
     printf("Max threads per block: %d\n", 
        prop.maxThreadsPerBlock); 
     printf("Max thread dimensions: (%d, %d, %d)\n", 
        prop.maxThreadsDim[0], prop.maxThreadsDim[1], 
        prop.maxThreadsDim[2]); 
     printf("Max grid dimensions: (%d, %d, %d)\n", 
        prop.maxGridSize[0], prop.maxGridSize[1], 
        prop.maxGridSize[2]); 
     printf("\n"); 
    } 
} 
相關問題