綁定CUDA紋理到浮動圖像

我在C面下面這樣的1個通道，浮動圖像：綁定CUDA紋理到浮動圖像

int width, height; 
float* img;

我想到該圖像傳遞到CUDA紋理。我讀了NVIDIA CUDA C Programming Guide（頁42-43）和使用教程，寫類似下面的代碼：

main.cpp中：

int main() 
{ 
    int width, height; 
    float* h_Input; 
    ReadImage(&h_Input, &width, &height); // My function which reads the image. 
    WriteImage(h_Input, width, height); // works perfectly... 

    float* h_Output = (float*) malloc(sizeof(float) * width * height); 

    CalculateWithCuda(h_Input, h_Output, width,height); 
    WriteImage(h_Output, width, height); // writes an empty-gray colored image.... *WHY???* 
}

內核。 cu：

texture<float, cudaTextureType2D, cudaReadModeElementType> texRef; // 2D float texture 

__global__ void Kernel(float* output, int width, int height) 
{ 
    int i = blockIdx.y * blockDim.y + threadIdx.y; // row number 
    int j = blockIdx.x * blockDim.x + threadIdx.x; // col number 

    if(i < height && j < width) 
    { 
      float temp = tex2D(texRef, i + 0.5f, j + 0.5f); 
      output[i * width + j] = temp ; 
    } 
} 

void CalculateWithCuda(const float* h_input, float* h_output, int width, int height) 
{ 
    float* d_output; 

    // Allocate CUDA array in device memory 
    cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0,cudaChannelFormatKindFloat); 
    cudaArray* cuArray; 
    cudaMallocArray(&cuArray, &channelDesc, width, height); 
    // Copy to device memory some data located at address h_data in host memory 
    cudaMemcpyToArray(cuArray, 0, 0, h_input, width * height * sizeof(float) , cudaMemcpyHostToDevice); 
    // Set texture parameters 
    texRef.addressMode[0] = cudaAddressModeWrap; 
    texRef.addressMode[1] = cudaAddressModeWrap; 
    texRef.filterMode  = cudaFilterModeLinear; 
    texRef.normalized  = true; 

    // Bind the array to the texture reference 
    cudaBindTextureToArray(texRef, cuArray, channelDesc); 

    // Allocate GPU buffers for the output image .. 
    cudaMalloc(&d_output, sizeof(float) * width * height); 

    dim3 threadsPerBlock(16,16); 
    dim3 numBlocks((width/threadsPerBlock.x) + 1, (height/threadsPerBlock.y) + 1); 

    Kernel<<<numBlocks, threadsPerBlock>>>(d_output, width,height); 

    cudaDeviceSynchronize(); 

    // Copy output vector from GPU buffer to host memory. 
    cudaMemcpy(h_output, d_output, sizeof(float) * width * height, cudaMemcpyDeviceToHost); 

    // Free GPU memory ... 
}

正如我在代碼中所說的那樣;這個內核必須從紋理中讀取，並給出與輸出相同的圖像。但是，我正在輸出一個空的（灰色）圖像。我只是在教程中以相同的方式實現，爲什麼這個紋理不起作用？

如果有人告訴我一個方法來解決這個問題，我將不勝感激...

PS：當然，這不是所有的代碼。我只是複製了必要的部分。如果你需要其他細節，我也會支持。

在此先感謝。

來源

2012-05-12 Sait

爲了清楚起見，編程指南在其示例中不包括任何對呼叫返回值的檢查。我不確定我是否同意這個決定，因爲檢查返回值真的應該在每個CUDA程序員中根深蒂固。請查看SDK示例，瞭解如何檢查每個CUDA調用（包括內核調用）的返回值，並讓我們知道這是否有助於您解決問題。 –

@RogerDahl：其實，我已經這麼做了......你的意思是使用'cudaStatus'，對吧？我沒有任何錯誤消息。看起來很好。但是，代碼不會生成我正在等待的圖像... – Sait

問題出在'texRef.normalized = true;'行。我刪除它，它工作正常，不知道爲什麼... – Sait

當使用歸一化座標的紋理經由座標訪問從0到1（不包括）。你忘了把你的整數基於threadIdx的座標轉換爲標準化。

unsigned int x = blockIdx.x * blockDim.x + threadIdx.x; 
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y; 
float u = x/(float)width; 
float v = y/(float)height;

來源

2012-05-12 18:25:15 geek

謝謝你的答案。我會修正這個語法，但是不會讓我。（提示，提示:)） –

@ marina.k：是的，我現在得到了這個問題。謝謝您的回答... – Sait

綁定CUDA紋理到浮動圖像

回答

相關問題