我最近開始使用OpenMP進行多線程(MT)我的圖像處理項目。試圖瞭解OpenMP浮動數值錯誤
我沒有任何問題,除了一個(不是計算量很大,但更多的浮點操作與其他int中的int相比)。
所以第一件事情,讓我們說,單線程(ST)結果是等於圖像X,那MT結果是Y.
當使用小窗口平均,X == Y,但是當窗口變大(5x5)時,X!= Y.
因此,我引入了一些「打印」來查看特定像素的值,使用打印熱潮! X == Y再次。這是我想了解的。 爲什麼當我在該代碼中打印時,結果返回到結果X?
請注意,我試圖改變浮點模型(英特爾編譯器)精確和擴展,並且這兩個模型都給出ST和MT相等,但是新ST結果Z!= X並且使用默認的浮點模型。
編輯:當前的代碼:
const int tileOffset = 1;
unsigned char** texturePtr = (unsigned char**)texture->getRowPtr();
short** wrkSrcPtr = (short**)wrkSrc->getRowPtr();
short** imFitAPtr = (short**)imFitA->getRowPtr();
short** imFitBPtr = (short**)imFitB->getRowPtr();
short** imFitCPtr = (short**)imFitC->getRowPtr();
// now, compute raw texture value for each pixel using the above plane equations
#pragma omp parallel num_threads(g_options->ompNumberThreads) if(g_options->ompThreaded)
{
#pragma omp for
for (int i = 0; i < src->getHeight(); i = i + tileOffset) {
for (int j = 0; j < src->getWidth(); j = j + tileOffset) {
bool printPoint = false;
int jVal = 333;
int iVal = 99;
if (j == jVal && i == src->getHeight() - iVal - 1) {
printPoint = true;
printf("\n\nAt (%d, %d) with Thread %d \n", jVal, iVal, omp_get_thread_num());
}
jVal = 343;
iVal = 204;
if (j == jVal && i == src->getHeight() - iVal - 1) {
printPoint = true;
printf("\n\nAt (%d, %d) with Thread %d \n", jVal, iVal, omp_get_thread_num());
}
const int ti = i * tileOffset;
const int tj = j * tileOffset;
const float planeA = imFitAPtr[i][j]/32000.0f*255.0f;
const float planeB = imFitBPtr[i][j]/32000.0f*255.0f;
const float planeC = imFitCPtr[i][j]/32000.0f*255.0f;
float sum2 = 0.0f;
float sum = 0.0f;
int nbSum = 0;
if (printPoint) {
printf("Fit (A,B,C) = (%d, %d, %d) and In float (%f, %f, %f) \n",
imFitAPtr[i][j], imFitBPtr[i][j], imFitCPtr[i][j],
planeA, planeB, planeC);
}
for (int ri = i - halfROI; ri <= i + halfROI; ri++) {
for (int rj = j - halfROI; rj <= j + halfROI; rj++) {
// sanity checks (image boundaries)
if (ri < 0 || ri >= src->getHeight() || rj < 0 || rj >= src->getWidth()) continue;
// eval the local plane at that pixel and compute the residual
const float localPlaneValue = planeA * (rj - j) + planeB * (ri - i) + planeC;
const float residual = wrkSrcPtr[ri][rj]/32000.0f*255.0f - localPlaneValue;
const float rr = residual*residual;
if (printPoint)
printf("Local: %f, residual: %f, resSQ: %f, sum2: %f and sum: %f \n ", localPlaneValue, residual, rr, sum2, sum);
sum2 += rr;
sum += residual;
nbSum++;
if (printPoint)
printf("Add sum2: %f, add sum: %f and nb: %d \n ", sum2, sum, nbSum);
}
}
if (printPoint)
printf("\n");
// the texture for that pixel is the stdev
float texVal = 0.0f;
if (nbSum > 1) {
texVal = sqrtf(max((sum2 - sum * sum/nbSum)/(nbSum - 1), 0.0f)) * scaling;
if (texVal > 255.0f) texVal = 255;
}
texturePtr[ti][tj] = (unsigned char)texVal;
if (printPoint)
printf("Final value : %d (In float: %f) \n\n", texturePtr[ti][tj], texVal);
}
}
} // End OMP
隨着「外面打印」我注意到平方殘差(RR)和平方和(SUM2)分別爲那些不ST和MT之間的穩定的值。
您的代碼中可能有一個錯誤(無論是並行版本,還是最初的並行版本都會變得明顯)。只需發佈代碼,我們就會看到。 – Gilles
如果打印語句的存在/不存在影響計算結果,則在所述計算中很有可能出現未定義的行爲(也稱爲錯誤)。 – Angew
你可能使用'=='來比較浮點數嗎? – user463035818