學習CUDA，但目前卡住了

因此，我一直在努力學習CUDA，但我目前陷入困境，不知道自己做錯了什麼。我試圖根據0和1之間的隨機浮點數設置opool數組的初始值。如果有人可以闡明我做錯了什麼，它將不勝感激。注意 - 我省略了一些代碼（簡稱cudaFree（）& free（）調用））。如果我留下任何重要的代碼，我很抱歉。學習CUDA，但目前卡住了

__global__ void FirstLoop(int *opool, float *randomSet, int omax, int anumber) 
{ 
int tid_loci = threadIdx.x; 
int tid_2 = threadIdx.y; 
int bid_omax = blockIdx.x; 

int index = omax*tid_loci*2 + omax*tid_2 + bid_omax; 
float r = randomSet[ index ]; 

// Commented out code is what it should be set to, but they are set to 5 or 15 
    // to determine if the values are correctly being set. 
if (r < 0.99 ) 
    opool[ index ] = 15; //(int)((r * 100.0) * -1.0); 
else 
    opool[ index ] = 5; //(int)((r)*(float)(anumber-4)) +5; 
} 

int main() 
{ 
    int loci = 10; 
    int omax = 20; 

     // Data stored on the host 
    int *h_opool; 
    float *h_randomSet; 

    // Data stored on the device 
    int *d_opool; 
    float *d_randomSet; 

    int poolSize = helpSize * omax; 
    int randomSize = loci * 2 * omax * sizeof(float); 

    // RESIZE ARRAYS TO NEEDED SIZE 
    h_opool = (int*)malloc(poolSize);  
    h_randomSet= (float*)malloc(randomSize); 

    cudaMalloc(&d_opool, poolSize); 
    cudaMalloc(&d_randomSet,randomSize); 


    for (sim=0; sim<smax; sim++) 
    { 
    for (i=0; i<poolSize; i++) 
     h_randomSet[i] = rndm(); 

    dim3 blocks(omax); 
    dim3 thread(loci, 2); 
    cudaMemcpy(d_randomSet, h_randomSet, randomSize, cudaMemcpyHostToDevice); 
    cudaMemcpy(d_opool, h_opool, poolSize, cudaMemcpyHostToDevice); 
    FirstLoop<<< blocks, thread >>>(d_opool, d_randomSet, omax, anumber); 
    cudaMemcpy(h_opool, d_opool, poolSize, cudaMemcpyDeviceToHost); 

    // Here is when I call printf to see the values stored in h_opool, but they are 
    // completely wrong 
    } 
} 
float rndm() 
{ 
    int random = rand(); 
    return ((float)random/(float)RAND_MAX); 
}

來源

2011-12-14 Ronnie

您的錯誤是什麼？編譯器？鏈接？意外的行爲？詳情請。你可以嘗試做一些非常簡單和非隨機的事情來幫助解決問題。 – axon 2011-12-14 02:00:55

更改以下

int index = omax*tid_loci*2 + omax*tid_2 + bid_omax;

到

int index = bid_omax * tid_2 + tid_loci;

然而10X2的塊配置可能不是最理想的一個。嘗試使用32 x 1或16 x 2.

來源

2011-12-14 05:27:32

學習CUDA，但目前卡住了

回答

相關問題