2016-08-03 41 views
0

最後更新 CUDA NPP中值濾波:解決。 WDDM超時也是一個問題。找到了解決方案:WDDM timeout fix。謝謝羅伯特。16個圖像

更新:感謝羅伯特指出,過濾器的中心不是0,0。不幸的是,如果過濾器增加了,那麼您發佈的代碼將會打破我的說法,比如17x17。這可能是由於您沒有考慮圖像「側面」的邊界。在任何情況下,這裏是最當前的代碼,但作爲前仍表現出同樣的問題...

//npp 
#include "npp.h" 
#include "nppi.h" 
#include "device_launch_parameters.h" 

#include <iostream> 

int main() { 

    //Image size. 
    int imageWidth = 6592; 
    int imageHeight = 4400; 

    //Misc. 
    int bytesPerPixel = 2; 
    int totalPixels = imageWidth*imageHeight; 
    int filterSize = 17; 
    int halfFilter = filterSize/2; 
    cudaError success2; 
    NppStatus success1; 

    //Mask & Origin for CUDA. 
    NppiSize cudaMask; 
    cudaMask.height = filterSize; 
    cudaMask.width = filterSize; 
    NppiPoint cudaAnchor; 
    cudaAnchor.x = halfFilter; 
    cudaAnchor.y = halfFilter; 

    //ROI for CUDA. 
    int left = halfFilter; 
    int right = (imageWidth-1) - halfFilter; 
    int top = halfFilter; 
    int bot = (imageHeight-1) - halfFilter; 
    NppiSize cudaROI; 
    cudaROI.height = bot - top; 
    cudaROI.width = right - left; 

    //Step size. 
    int step = imageWidth * bytesPerPixel; 

    //Create a new "image". 
    unsigned short* image = new unsigned short[totalPixels]; 
    for(int i=0; i<imageWidth; i++) 
     for(int j=0; j<imageHeight; j++) 
      image[j*imageWidth+i] = 10; 

    //Allocate mem on device. 
    Npp16u *dSrc, *dDst; 
    Npp8u *dBuf; 
    Npp32u bufferSize; 

    //This call always returns a bufferSize==0. That doesn't seem right... 
    success1 = nppiFilterMedianGetBufferSize_16u_C1R(cudaROI, cudaMask, &bufferSize); 
    std::cout << "get bufferSize returned: " << (int)success1 << std::endl; 
    std::cout << bufferSize << std::endl; 
    success2 = cudaMalloc((void**)&dBuf, bufferSize); 
    std::cout << "cudaMalloc 1 returned: " << (int)success2 << std::endl; 
    success2 = cudaMalloc((void**)&dSrc, totalPixels*sizeof(Npp16u)); 
    std::cout << "cudaMalloc 2 returned: " << (int)success2 << std::endl; 
    success2 = cudaMalloc((void**)&dDst, totalPixels*sizeof(Npp16u)); 
    std::cout << "cudaMalloc 3 returned: " << (int)success2 << std::endl; 

    //Copy host image to device. 
    success2 = cudaMemcpy(dSrc, image, totalPixels*sizeof(Npp16u), cudaMemcpyHostToDevice); 
    std::cout << "cudaMemcpy 1 returned: " << (int)success2 << std::endl; 


    //Copy source to destination. 
    success1 = nppiCopy_16u_C1R(dSrc, step, dDst, step, cudaROI); 
    std::cout << "npp Copy 1 returned: " << (int)success1 << std::endl; 


    //Filter. 
    Npp32u offset = top*step + left*bytesPerPixel; 
    success1 = nppiFilterMedian_16u_C1R( dSrc + offset, 
              step, 
              dDst + offset, 
              step, 
              cudaROI, cudaMask, cudaAnchor, dBuf); 
    std::cout << "npp Filter returned: " << (int)success1 << std::endl; 


    //Copy resultant back to host. 
    success2 = cudaMemcpy(image, dDst, totalPixels*sizeof(Npp16u), cudaMemcpyDeviceToHost); 
    std::cout << "cudaMemcpy 2 returned: " << (int)success2 << std::endl; 

    //Clean. 
    success2 = cudaFree(dDst); 
    success2 = cudaFree(dBuf); 
    success2 = cudaFree(dSrc); 
    delete image; 

    system("pause"); 
    return 0; 

} 

我試圖計算的29mp圖像的中值濾波。過濾器大小爲13x13。圖像的寬度和高度如下所示。 對於一個未知的原因,下面的代碼將崩潰,我問,如果有人知道這是爲什麼?

奇怪的事情我已經注意到:與nppiFilterMedian_16u_C1R發生

  1. 錯誤()。該函數本身返回一個沒有錯誤的情況,但是下面的cudaMemcpy()會做。沒有過濾器,cudaMemcpy()工作得很好。

  2. 此外,爲獲得16位過濾器的緩衝區大小總是返回0大小我測試過8位和32位,它會返回非零值...

  3. 我認爲這是可能是NPPI庫中的一個錯誤(?)。 這似乎是大小依賴(如果你使用降低圖像的寬度/高度將功能就好了13×13濾波器尺寸)。但是,我的過濾器尺寸需要增加到31x31。

其他重要信息: 的Windows應用程序64,CUDA 7.5運行時,NPP版本7.5。 GPU設備是Quadro k2200(4GB全球內存)。

+1

請張貼代碼,實際上編譯 –

+1

中值濾波器是不確定的(非法)當面罩外定義的輸入圖像區域。當你設置你的代碼時,這是圖像邊界區域的情況。在[等效英特爾IPP文檔】(http://hpc.ipp.ac.cn/wp-content/uploads/2015/12/documentation_2016/en/ipp/common/ipp_manual/GUID-5D2F9418-E4F6-4F6C-B0F7 -B438CF28EA63.htm)你會注意到它需要減少輸出尺寸,即:「爲了確保在圖像邊界的像素處理有效的操作,應用程序應該正確定義額外的邊界像素」你違反了這個規則 –

+0

羅伯特,我發佈了編譯的代碼。我仍在研究你提到的關於邊界像素的內容。現在,我只是試圖降低投資回報率,以便不需要邊框複製/複製。 –

回答

1

中值濾波器函數將通過掩模在圖像上,逐點。該面具具有指定的尺寸(原始代碼爲9x9)。錨點將決定如何爲每個像素定位此遮罩。當錨點是0,0,掩模將被定位是這樣的:

p** 
*** 
*** 

其中p表示像素的位置,和掩模大小3×3是。對於1,1,掩模定位的錨定點,每個像素,將是:

*** 
*p* 
*** 

因此我們看到,錨定點,以及掩模的大小,將確定每個像素周圍一定邊界這必須可以通過中值過濾函數訪問。在處理圖像邊界中的像素時,我們必須確保此邊界落在有效像素上。

你開始,9x9的面具和0,0錨點的情況下,意味着我們只需要邊界「額外」的像素在圖像的「結束」。因此,變形例是簡單的:限制ROI高度,以便不處理圖像的最後幾行中,對應於光掩模的尺寸。對於這種情況,我們可以簡單地從ROI的高度減去10,和錯誤走:

$ cat t1223.cu 
//npp 
#include "npp.h" 
#include "nppi.h" 
#include <iostream> 

int main() { 

//When the filter size is 9x9.... 
int imageWidth = 6592; //breaks if > 5914 && imageHeight = 4400 
int imageHeight = 4400; //breaks if > 3946 && imageWidth = 6592 

//Misc. 
int bytesPerPixel = 2; 
int totalPixels = imageWidth*imageHeight; 
cudaError success2; 
NppStatus success1; 

//ROI for CUDA. 
NppiSize cudaROI; 
cudaROI.height = imageHeight-10; 
cudaROI.width = imageWidth; 

//Mask & Origin for CUDA. 
NppiSize cudaMask; NppiPoint cudaAnchor; 
cudaMask.height = 9; //filter size 
cudaMask.width = 9; 
cudaAnchor.x = 0; 
cudaAnchor.y = 0; 

//Step size. 
int step = imageWidth * bytesPerPixel; 

//Create a new "image". 
unsigned short* image = new unsigned short[totalPixels]; 
for(int i=0; i<imageWidth; i++) 
    for(int j=0; j<imageHeight; j++) 
     image[j*imageWidth+i] = 10; 


//Allocate mem on device. 
Npp16u *dSrc, *dDst; 
Npp8u *dBuf; 
Npp32u bufferSize; 

//This call always returns a bufferSize==0. That doesn't seem right... 
success1 = nppiFilterMedianGetBufferSize_16u_C1R(cudaROI, cudaMask, &bufferSize); 
std::cout << "get bufferSize returned: " << (int)success1 << std::endl; 
std::cout << bufferSize << std::endl; 
success2 = cudaMalloc((void**)&dBuf, bufferSize); 
std::cout << "cudaMalloc 1 returned: " << (int)success2 << std::endl; 
success2 = cudaMalloc((void**)&dSrc, totalPixels*sizeof(Npp16u)); 
std::cout << "cudaMalloc 2 returned: " << (int)success2 << std::endl; 
success2 = cudaMalloc((void**)&dDst, totalPixels*sizeof(Npp16u)); 
std::cout << "cudaMalloc 3 returned: " << (int)success2 << std::endl; 

//Copy host image to device. 
success2 = cudaMemcpy(dSrc, image, totalPixels*sizeof(Npp16u), cudaMemcpyHostToDevice); 
std::cout << "cudaMemcpy 1 returned: " << (int)success2 << std::endl; 

//Copy source to destination. 
success1 = nppiCopy_16u_C1R(dSrc, step, dDst, step, cudaROI); 
std::cout << "npp Copy 1 returned: " << (int)success1 << std::endl; 

//Filter. 
success1 = nppiFilterMedian_16u_C1R(dSrc, 
            step, 
            dDst, 
            step, 
            cudaROI, cudaMask, cudaAnchor, dBuf); 
std::cout << "npp Filter returned: " << (int)success1 << std::endl; 

//Copy resultant back to host. 
success2 = cudaMemcpy(image, dDst, totalPixels*sizeof(Npp16u), cudaMemcpyDeviceToHost); 
std::cout << "cudaMemcpy 2 returned: " << (int)success2 << std::endl; 

//Clean. 
success2 = cudaFree(dBuf); 
success2 = cudaFree(dSrc); 
success2 = cudaFree(dDst); 
delete image; 

return 0; 
} 
$ nvcc -arch=sm_35 -o t1223 t1223.cu -lnppi 
$ cuda-memcheck ./t1223 
========= CUDA-MEMCHECK 
get bufferSize returned: 0 
0 
cudaMalloc 1 returned: 0 
cudaMalloc 2 returned: 0 
cudaMalloc 3 returned: 0 
cudaMemcpy 1 returned: 0 
npp Copy 1 returned: 0 
npp Filter returned: 0 
cudaMemcpy 2 returned: 0 
========= ERROR SUMMARY: 0 errors 
$ 

注意,如果錨點被移動(比如,4,4,而不是0,0的情況下以上),那麼這將意味着「邊界」的像素將需要可用於圖像開始前〜5行。我們可以考慮到這一點通過正確設置ROI,也抵消處理開始,加入了行偏移傳遞給中值濾波源指針,就像這樣:

success1 = nppiFilterMedian_16u_C1R(dSrc + 5*imageWidth, 

請注意,我不是想在這裏給出一個關於中值過濾的完整教程,試圖找出導致實際功能故障的問題。左側和右側的濾鏡邊界也是您可能想要考慮的事情。在圖像邊界的左側和右側,這些像素遮罩邊界可以索引到前一個或下一個圖像線,從而「包裹」圖像,或許在濾波的像素中具有奇數效果。

編輯:響應新的代碼發佈,現在的主要問題似乎是,你不知道如何抵消圖像。

在C/C++,如果我有一個指針,也想通過特定數量的元件來抵消該指針,我只需添加我想通過以抵消它的元素的數量。沒有必要按字節進行縮放。如果你已經研究了我之前給出的偏移示例,那麼你應該注意到沒有按字節來縮放任何東西。如果我們想要偏移5行,它只是乘以圖像寬度,如上所示。

此外,您使用的cudaROI告知您的SRC-> DST複製操作,這是沒有意義的我,所以我修改的。最後,我修改了代碼,以便可以使用角落中的錨點或中心的錨點來構建代碼。

這是你的代碼的修改,編譯和運行正常對我來說,在這兩個錨情況:

$ cat t1225.cu 
//npp 
#include "npp.h" 
#include "nppi.h" 
#include "device_launch_parameters.h" 

#include <iostream> 

int main() { 

    //Image size. 
    int imageWidth = 6592; 
    int imageHeight = 4400; 

    //Misc. 
    int bytesPerPixel = 2; 
    int totalPixels = imageWidth*imageHeight; 
    int filterSize = 17; 
    int halfFilter = filterSize/2; 
    cudaError success2; 
    NppStatus success1; 

    //Mask & Origin for CUDA. 
    NppiSize cudaMask; 
    cudaMask.height = filterSize; 
    cudaMask.width = filterSize; 
    NppiPoint cudaAnchor; 
#ifndef ANCHOR_CORNER 
    cudaAnchor.x = halfFilter; 
    cudaAnchor.y = halfFilter; 
#else 
    cudaAnchor.x = 0; 
    cudaAnchor.y = 0; 
#endif 
    NppiSize imgSize; 
    imgSize.width = imageWidth; 
    imgSize.height = imageHeight; 

    //ROI for CUDA. 
    int left = halfFilter; 
    int right = (imageWidth-1) - halfFilter; 
    int top = halfFilter; 
    int bot = (imageHeight-1) - halfFilter; 
    NppiSize cudaROI; 
    cudaROI.height = bot - top; 
    cudaROI.width = right - left; 

    //Step size. 
    int step = imageWidth * bytesPerPixel; 

    //Create a new "image". 
    unsigned short* image = new unsigned short[totalPixels]; 
    for(int i=0; i<imageWidth; i++) 
     for(int j=0; j<imageHeight; j++) 
      image[j*imageWidth+i] = 10; 

    //Allocate mem on device. 
    Npp16u *dSrc, *dDst; 
    Npp8u *dBuf; 
    Npp32u bufferSize; 

    //This call always returns a bufferSize==0. That doesn't seem right... 
    success1 = nppiFilterMedianGetBufferSize_16u_C1R(cudaROI, cudaMask, &bufferSize); 
    std::cout << "get bufferSize returned: " << (int)success1 << std::endl; 
    std::cout << bufferSize << std::endl; 
    success2 = cudaMalloc((void**)&dBuf, bufferSize); 
    std::cout << "cudaMalloc 1 returned: " << (int)success2 << std::endl; 
    success2 = cudaMalloc((void**)&dSrc, totalPixels*sizeof(Npp16u)); 
    std::cout << "cudaMalloc 2 returned: " << (int)success2 << std::endl; 
    success2 = cudaMalloc((void**)&dDst, totalPixels*sizeof(Npp16u)); 
    std::cout << "cudaMalloc 3 returned: " << (int)success2 << std::endl; 

    //Copy host image to device. 
    success2 = cudaMemcpy(dSrc, image, totalPixels*sizeof(Npp16u), cudaMemcpyHostToDevice); 
    std::cout << "cudaMemcpy 1 returned: " << (int)success2 << std::endl; 


    //Copy source to destination. 
    success1 = nppiCopy_16u_C1R(dSrc, step, dDst, step, imgSize); 
    std::cout << "npp Copy 1 returned: " << (int)success1 << std::endl; 


    //Filter. 
#ifndef ANCHOR_CORNER 
    Npp32u offset = top*imageWidth + left; 
#else 
    Npp32u offset = 0; 
#endif 
    success1 = nppiFilterMedian_16u_C1R( dSrc + offset, 
              step, 
              dDst + offset, 
              step, 
              cudaROI, cudaMask, cudaAnchor, dBuf); 
    std::cout << "npp Filter returned: " << (int)success1 << std::endl; 


    //Copy resultant back to host. 
    success2 = cudaMemcpy(image, dDst, totalPixels*sizeof(Npp16u), cudaMemcpyDeviceToHost); 
    std::cout << "cudaMemcpy 2 returned: " << (int)success2 << std::endl; 

    //Clean. 
    success2 = cudaFree(dDst); 
    success2 = cudaFree(dBuf); 
    success2 = cudaFree(dSrc); 
    delete image; 

    return 0; 

} 
$ nvcc -o t1225 t1225.cu -lnppi 
$ cuda-memcheck ./t1225 
========= CUDA-MEMCHECK 
get bufferSize returned: 0 
0 
cudaMalloc 1 returned: 0 
cudaMalloc 2 returned: 0 
cudaMalloc 3 returned: 0 
cudaMemcpy 1 returned: 0 
npp Copy 1 returned: 0 
npp Filter returned: 0 
cudaMemcpy 2 returned: 0 
========= ERROR SUMMARY: 0 errors 
$ nvcc -DANCHOR_CORNER -o t1225 t1225.cu -lnppi 
$ cuda-memcheck ./t1225 
========= CUDA-MEMCHECK 
get bufferSize returned: 0 
0 
cudaMalloc 1 returned: 0 
cudaMalloc 2 returned: 0 
cudaMalloc 3 returned: 0 
cudaMemcpy 1 returned: 0 
npp Copy 1 returned: 0 
npp Filter returned: 0 
cudaMemcpy 2 returned: 0 
========= ERROR SUMMARY: 0 errors 
+0

Robert,如果過濾器大小增加,此代碼將會中斷。我剛剛發佈了一個包含您的建議的代碼版本,但是在filterSize較大的情況下,NPP仍會中斷。 –

+0

當然,如果你做*沒有*但增加過濾器大小(讓我們說到17x17),那麼代碼將按原樣打破,因爲你沒有按照我給出的指示。如果我在我的答案中使用了這個確切的代碼,並且將過濾器大小增加到17x17(而不是9x9大小),並且還**將ROI高度減少了20(而不是我的答案中發佈的10),爲了解決更大的過濾器大小問題,那麼代碼將成功運行以完成我的工作,就像之前的代碼一樣。 –

+0

是的,我同意並忽略說我也改變了投資回報率。當我將ROI高度降低20時,出現以下錯誤:npp過濾器返回:-1000,CUDA ERROR#30(使用FilterMedian內核#4):未知錯誤。 –