爲什麼這個內核產生不連貫的賣場不連貫的賣場
__global__ void reverseArrayBlock(int *d_out, int *d_in)
{
int inOffset = blockDim.x * blockIdx.x;
int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x);
int in = inOffset + threadIdx.x;
int out = outOffset + (blockDim.x - 1 - threadIdx.x);
d_out[out] = d_in[in];
}
,這一次不
__global__ void reverseArrayBlock(int *d_out, int *d_in)
{
extern __shared__ int s_data[];
int inOffset = blockDim.x * blockIdx.x;
int in = inOffset + threadIdx.x;
// Load one element per thread from device memory and store it
// *in reversed order* into temporary shared memory
s_data[blockDim.x - 1 - threadIdx.x] = d_in[in];
// Block until all threads in the block have written their data to shared mem
__syncthreads();
// write the data from shared memory in forward order,
// but to the reversed block offset as before
int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x);
int out = outOffset + threadIdx.x;
d_out[out] = s_data[threadIdx.x];
}
我知道,第二個是使用共享內存。但是當我看到d_out的指示時,它們在內核中似乎是相同的。你能幫我理解嗎?
感謝您的澄清。 – WannabeCoder 2011-06-02 21:26:46
請注意,示例代碼只能在Compute Capability 1.1和早期設備上生成不連貫的商店。從1.2設備開始,此訪問模式完全合併/一致。 – harrism 2011-06-06 09:19:53