CUDA編程修改blockIdx.x索引

-1

__global__ void add(int *a, int *b, int *c) { 
int tid = blockIdx.x; // handle the data at this index if (tid < N) 
} 
c[tid] = a[tid] + b[tid]; 

kernel<<<6,1>>> // 6 blocks running the GPU.

上面的代碼是2個向量的基本總和。但我想修改正在添加的數組的索引。例如，如果我有我的第一個數組A = [1,2,3,4,5,6]和B = [10,20,30,40,50,60]。我想使用A和B的元素來獲得數組C = [1 + 60,2 + 50,3 + 40,4 + 30,5 + 20,6 + 10]。 blockIdx.x自動增加由1，似乎，所以我不知道如何修改它。CUDA編程修改blockIdx.x索引

來源

2017-05-18 Mint.K

發佈實際的代碼，而不是類似的東西。此外，不要發送垃圾郵件標籤，這在某種意義上與C++/c無關。順便說一句，在數組長度 –

請仔細閱讀[CUDA編程指南]（http://docs.nvidia.com/cuda/cuda-c-programming-guide/），特別是第2章。簡而言之：你不能修改'blockIdx '也不'threadIdx'或類似的變量。它們是每個線程的獨特組合。 – Shadow

正如Shadow所說，每個線程都分配了自己的threadIdx,blockDim,blockIdx和gridDim值。你不能修改它們。

對於您的示例，您可以使用gridDim.x來獲取像這樣的塊數。（full code）

__global__ void add(const int *a, const int *b, int *c) 
{ 
    int tid = blockIdx.x; 
    c[tid] = a[tid] + b[(gridDim.x - 1)- tid]; 
}

爲了確保tid停留在陣列邊界，可以傳遞數組元素作爲參數的數量。

__global__ void add(const int *a, const int *b, int *c, const int N) 
{ 
    int tid = blockIdx.x; 
    if (tid < N) 
     c[tid] = a[tid] + b[(gridDim.x - 1)- tid]; 
}

如果啓動這個內核像add<<<6, 1>>>(a, b, c, 6)，然後if (tid < N)是多餘的，因爲你只推出6塊呢。但是一般情況下，你會啓動多個塊，其中每個塊有多個線程，最後一個塊可能有一些填充線程。

unsigned int N = 1000; // total number of elements 
dim3 blkDim{ 32 }; 
dim3 grdDim{ (N + 32 - 1)/32 }; 
add<<<grdDim, blkDim>>>(a, b, c, N);

在這種情況下，您必須檢查數組索引的邊界條件。

__global__ void add(const int *a, const int *b, int *c, const int N) 
{ 
    int tid = blockIdx.x * blockDim.x + threadIdx.x; 
    if (tid < N) 
     c[tid] = a[tid] + b[(N - 1)- tid]; 
}

來源

2017-05-18 03:21:07 nglee

在非常具體的情況下，這個答案是有效的，但你至少應該提到例如填充可能是一個問題。傳遞數組大小的路要走（正如Passer By已經提到的那樣）。 – Shadow

@Shadow回覆編輯。 – nglee

不錯。我很感激你花時間編輯你的答案。（順便說下，這不是我的，所以我不能推你到「+1」） – Shadow

CUDA編程修改blockIdx.x索引

回答

相關問題