2
有誰知道爲什麼以下函數使用共享數據的16 432 B? 在我看來,這應該是:32x32x8x2 = 16 384乙Cuda ::輸入函數共享數據
__global__ void matrixMulKernel(double *c, const double *a, const double *b, unsigned int size)
{
__shared__ double as[32][32];
__shared__ double bs[32][32];
unsigned int bx = blockIdx.x, by = blockIdx.y;
unsigned int tx = threadIdx.x, ty = threadIdx.y;
unsigned int row = bx * TILE_WIDTH + tx;
unsigned int col = by * TILE_WIDTH + ty;
double Pval = 0.0;
for(unsigned int q = 0; q < size/TILE_WIDTH; q++)
{
as[tx][ty] = a[row * size + q * TILE_WIDTH + ty];
bs[ty][tx] = b[(q * TILE_WIDTH + tx) * size + col];
__syncthreads();
for(unsigned int k = 0; k < TILE_WIDTH; k++)
Pval += as[tx][k] * bs[k][ty];
__syncthreads();
}
c[row * size + col] = Pval;
}
編譯器是給下面的錯誤:
Entry function '_Z15matrixMulKernelPdPKdS1_j' uses too much shared data (0x4030 bytes, 0x4000 max)
我很感興趣,爲什麼會是這樣,而不是作爲一種解決方法:)
謝謝羅伯特。我已經編譯了nvcc -arch = sm_30。我只是好奇它爲什麼這樣工作。問候 – b1es