這是一個稍微投機的答案,但要記住,有2種取向性,其分配的間距必須滿足的紋理,一個用於textutr指針和一個紋理行。我懷疑cudaMallocPitch
正在履行前者,由cudaDeviceProp::textureAlignment
定義。例如:
#include <cstdio>
int main(void)
{
const int ncases = 12;
const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100,
200, 500, 700, 900, 1000 };
const size_t height = 10;
float *vals[ncases];
size_t pitches[ncases];
struct cudaDeviceProp p;
cudaGetDeviceProperties(&p, 0);
fprintf(stdout, "Texture alignment = %zd bytes\n",
p.textureAlignment);
cudaSetDevice(0);
cudaFree(0); // establish context
for(int i=0; i<ncases; i++) {
cudaMallocPitch((void **)&vals[i], &pitches[i],
widths[i], height);
fprintf(stdout, "width = %zd <=> pitch = %zd \n",
widths[i], pitches[i]);
}
return 0;
}
這給一個GT320M如下:
Texture alignment = 256 bytes
width = 5 <=> pitch = 256
width = 10 <=> pitch = 256
width = 20 <=> pitch = 256
width = 50 <=> pitch = 256
width = 70 <=> pitch = 256
width = 90 <=> pitch = 256
width = 100 <=> pitch = 256
width = 200 <=> pitch = 256
width = 500 <=> pitch = 512
width = 700 <=> pitch = 768
width = 900 <=> pitch = 1024
width = 1000 <=> pitch = 1024
我猜測cudaDeviceProp::texturePitchAlignment
適用於陣列。
這是什麼硬件?我一直髮現cudaMallocPitch會尊重報告的紋理對齊。在我現在可以訪問的唯一設備上,報告的字節對齊方式是256,我總是獲得256個字節的倍數。 – talonmies
我已更新該問題。在問題中添加了詳細的系統配置。 – sgarizvi