CUDA線程輸出不同的值

我寫了一個cuda程序，我給出了下面的內核函數。設備內存是通過CUDAMalloc（）分配的
; * md的值是10;

__global__ void add(int *md) 

{ 

    int x,oper=2; 
    x=threadIdx.x; 

    * md = *md*oper; 

if(x==1) 
    { 
     *md = *md*0; 
    } 

    if(x==2) 
    { 
     *md = *md*10; 
    } 

    if(x==3) 
    { 
     *md = *md+1; 
    } 

    if(x==4) 
    { 
     *md = *md-1; 
    } 

}

以上代碼執行

add<<<1,5>>(*md) , add<<<1,4>>>(*md) 

for <<<1,5>>> the output is 19 

for <<<1,4>>> the output is 21

1）I有疑問，cudaMalloc（）將在設備主存儲器分配？ 2）爲什麼最後一個線程總是在上面的程序中執行？

謝謝

來源

2011-02-24 kar

有失誤的一堆東西。檢查你的返回狀態，你編程段未知的段錯誤。 – Anycorn 2011-02-24 07:26:15

代碼中的每個線程都將不同的輸出寫入相同的位置（md）。因此，程序執行完成時md可以具有4-5個可能值中的任何一個。

如果你想抓住每一個線程的輸出，這裏是你應該做的

// The size of output is should be equal to the number of threads in your block 
    __global__ void add (int input, int * output){ 

    int x = threadIdx.x; 
    int oper = 2; 
     md = md*oper; 


    //thread Index starts from 0 in CUDA 

      if(x==0) 
      output[0]= md*0; // output is 0 


      if(x==1) 
      output[1] = md*10; // output is 200 


      if(x==2) 
      output[2] = md+1; // output is 21 


      if(x==3) 
      output[3] = md-1; // output is 19 


     ..... and so on 

    }

執行代碼

int value = 10; 
int * out; 
int size = 5*sizeof(int); 
cudaMalloc((void**)&out,size); 

add<<<1,5>>(value,out) 

int * host_out = (int*)malloc(size); 
cudaMemcpy(host_out,out,size,cudaMemcpyDeviceToHost); 

//Now the host_out should have the following values: 
//host_out[0] = 0 
//host_out[1] = 200 
//host_out[2] = 21 
//host_out[3] = 19 
//host_out[4] = ..

來源

2011-02-24 10:35:56 jwdmsd

謝謝你，我明白了 – kar 2011-02-25 04:31:03

CUDA線程輸出不同的值

回答

相關問題