如何將CUDA紋理對象嵌入到結構中？

我們已經成功地使用了以下文章來幫助創建包含像int *這樣的基本類型的結構。紋理爲只讀數組提供了很好的性能提升。我們使用其中的很多，這使內核和內核子函數的參數列表變得漫長而複雜。我們希望將紋理嵌入到結構中以減少參數長度和複雜度。如何將CUDA紋理對象嵌入到結構中？

Copying a struct containing pointers to CUDA device

這裏的代表，我們使用的代碼方法的一個片段。它編譯，但在運行時崩潰。

// Initialize texture description 
memset(&textureDescription, 0, sizeof(textureDescription)); 
textureDescription.readMode = cudaReadModeElementType; 

// Create Texture from variable 
cudaTextureObject_t texture = 0; 
cudaResourceDesc resource; 
memset(&resource, 0, sizeof(resource)); 
resource.resType = cudaResourceTypeLinear; 
resource.res.linear.devPtr = intArray; 
resource.res.linear.desc.f = cudaChannelFormatKindSigned; 
resource.res.linear.desc.x = 32; // bits per channel 
resource.res.linear.sizeInBytes = count*sizeof(int); 
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL); 

// These declarations are in the .h file 
typedef struct SampleStructure { 
    cudaTextureObject_t texture; 
} SampleStructure; 
SampleStructure *structureHost; 
SampleStructure *structureDevice; 

// Create host and device structures 
structureHost = (SampleStructure *)malloc(sizeof(SampleStructure)); 
cudaMalloc(&structureDevice, sizeof(SampleStructure)); 

// Assign the texture object to the host structure 
structureHost->texture = texture; 

// Copy the host structure to Global Memory 
cudaMemcpy(structureDevice, structureHost, sizeof(SampleStructure), cudaMemcpyHostToDevice)); 

// Pass Texture and Texture-embedded-in-structure to kernel 
kenerl<<<1,1>>>(texture, structureDevice); 

... 
__global__ void 
kernel(cudaTextureObject_t texture, SampleStructure *structureDevice) { 
    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime 
    value = tex1Dfetch<int>(structureDevice->texture, index); // Crashes at runtime 
}

當使用內核代碼的「紋理」變量（或子功能），它運行正確。如果使用「structureDevice-> texture」，則會在運行時崩潰。

有人可以展示一個簡單的代碼，展示如何成功地將一個紋理對象嵌入到一個傳遞給內核的結構中，並且運行時不會崩潰？或者有人可以指出我們提供的代碼中的錯誤可能在哪裏？

來源

2015-11-29 roger1994

爲什麼你不只是通過價值傳遞結構而不是參考？ – talonmies

@talonmies通過值傳遞導致內核運行時錯誤，只需調用內核即可。我修改了參數列表以接受值的結構，但是傳遞的結構會導致運行時失敗。 kernel <<1,1>>（texture，* structureDevice）; – roger1994

這不是通過價值傳遞結構。這是取消引用主機上的設備指針，這顯然是非法的。只需在主機內存中創建一個結構並按值傳遞即可。支持的體系結構可以使用大小爲4kb的參數列表，因此在使用傳遞值時沒有實際的大小限制 – talonmies

通過價值傳遞結構得到了一個工作解決方案。這是使它正常工作的代碼。感謝@talonmies的建議。

儘管結構可以簡化參數列表，但它可能會減慢執行速度，因爲系統必須對全局內存進行2次調用，而不是1：1調用來獲取結構和1次調用以獲取紋理。爲了提高性能，可以將結構複製到共享內存中。在共享內存中使用結構可提高性能。

// Create the Texture Object 
cudaResourceDesc resource; 
memset(&resource, 0, sizeof(resource)); 
resource.resType = cudaResourceTypeLinear; 
resource.res.linear.devPtr = intArray; 
resource.res.linear.desc.f = cudaChannelFormatKindSigned; 
resource.res.linear.desc.x = 32; // bits per channel 
resource.res.linear.sizeInBytes = count*sizeof(int); 
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL); 

// These structure declarations are in the .h file 
typedef struct SampleStructure { 
    cudaTextureObject_t texture; 
} SampleStructure; 
SampleStructure structureHost; 

// Assign the texture object to the host structure 
structureHost.texture = texture; 

// Pass Texture and Texture-object-embedded-in-structure to kernel 
kenerl<<<1,1>>>(texture, structureHost); 

... 
__global__ void 
kernel(cudaTextureObject_t texture, SampleStructure structureDevice) { 
    __shared__ SampleStructure structureSharedMemory; 

    // Copy the structure to shared memory for faster access 
    if (threadIdx.x == 0) 
     structureSharedMemory = structureDevice; 
    __threadfence_block(); 

    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime 
    value = tex1Dfetch<int>(structureSharedMemory.texture, index); // Runs successfully at runtime 
}

來源

2015-11-30 19:46:32 roger1994

如何將CUDA紋理對象嵌入到結構中？

回答

相關問題