2015-11-29 41 views
1

我們已經成功地使用了以下文章來幫助創建包含像int *這樣的基本類型的結構。紋理爲只讀數組提供了很好的性能提升。我們使用其中的很多,這使內核和內核子函數的參數列表變得漫長而複雜。我們希望將紋理嵌入到結構中以減少參數長度和複雜度。如何將CUDA紋理對象嵌入到結構中?

Copying a struct containing pointers to CUDA device

這裏的代表,我們使用的代碼方法的一個片段。它編譯,但在運行時崩潰。

// Initialize texture description 
memset(&textureDescription, 0, sizeof(textureDescription)); 
textureDescription.readMode = cudaReadModeElementType; 

// Create Texture from variable 
cudaTextureObject_t texture = 0; 
cudaResourceDesc resource; 
memset(&resource, 0, sizeof(resource)); 
resource.resType = cudaResourceTypeLinear; 
resource.res.linear.devPtr = intArray; 
resource.res.linear.desc.f = cudaChannelFormatKindSigned; 
resource.res.linear.desc.x = 32; // bits per channel 
resource.res.linear.sizeInBytes = count*sizeof(int); 
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL); 

// These declarations are in the .h file 
typedef struct SampleStructure { 
    cudaTextureObject_t texture; 
} SampleStructure; 
SampleStructure *structureHost; 
SampleStructure *structureDevice; 

// Create host and device structures 
structureHost = (SampleStructure *)malloc(sizeof(SampleStructure)); 
cudaMalloc(&structureDevice, sizeof(SampleStructure)); 

// Assign the texture object to the host structure 
structureHost->texture = texture; 

// Copy the host structure to Global Memory 
cudaMemcpy(structureDevice, structureHost, sizeof(SampleStructure), cudaMemcpyHostToDevice)); 

// Pass Texture and Texture-embedded-in-structure to kernel 
kenerl<<<1,1>>>(texture, structureDevice); 

... 
__global__ void 
kernel(cudaTextureObject_t texture, SampleStructure *structureDevice) { 
    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime 
    value = tex1Dfetch<int>(structureDevice->texture, index); // Crashes at runtime 
} 

當使用內核代碼的「紋理」變量(或子功能),它運行 正確。如果使用「structureDevice-> texture」,則會在運行時崩潰。

有人可以展示一個簡單的代碼,展示如何成功地將一個紋理對象嵌入到一個傳遞給內核的結構中,並且運行時不會崩潰?或者有人可以指出我們提供的代碼中的錯誤可能在哪裏?

+0

爲什麼你不只是通過價值傳遞結構而不是參考? – talonmies

+0

@talonmies通過值傳遞導致內核運行時錯誤,只需調用內核即可。我修改了參數列表以接受值的結構,但是傳遞的結構會導致運行時失敗。 kernel <<1,1>>(texture,* structureDevice); – roger1994

+2

這不是通過價值傳遞結構。這是取消引用主機上的設備指針,這顯然是非法的。只需在主機內存中創建一個結構並按值傳遞即可。支持的體系結構可以使用大小爲4kb的參數列表,因此在使用傳遞值時沒有實際的大小限制 – talonmies

回答

2

通過價值傳遞結構得到了一個工作解決方案。這是使它正常工作的代碼。感謝@talonmies的建議。

儘管結構可以簡化參數列表,但它可能會減慢執行速度,因爲系統必須對全局內存進行2次調用,而不是1:1調用來獲取結構和1次調用以獲取紋理。爲了提高性能,可以將結構複製到共享內存中。在共享內存中使用結構可提高性能。

// Create the Texture Object 
cudaResourceDesc resource; 
memset(&resource, 0, sizeof(resource)); 
resource.resType = cudaResourceTypeLinear; 
resource.res.linear.devPtr = intArray; 
resource.res.linear.desc.f = cudaChannelFormatKindSigned; 
resource.res.linear.desc.x = 32; // bits per channel 
resource.res.linear.sizeInBytes = count*sizeof(int); 
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL); 

// These structure declarations are in the .h file 
typedef struct SampleStructure { 
    cudaTextureObject_t texture; 
} SampleStructure; 
SampleStructure structureHost; 

// Assign the texture object to the host structure 
structureHost.texture = texture; 

// Pass Texture and Texture-object-embedded-in-structure to kernel 
kenerl<<<1,1>>>(texture, structureHost); 

... 
__global__ void 
kernel(cudaTextureObject_t texture, SampleStructure structureDevice) { 
    __shared__ SampleStructure structureSharedMemory; 

    // Copy the structure to shared memory for faster access 
    if (threadIdx.x == 0) 
     structureSharedMemory = structureDevice; 
    __threadfence_block(); 

    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime 
    value = tex1Dfetch<int>(structureSharedMemory.texture, index); // Runs successfully at runtime 
} 
相關問題