我有一些專有的圖像處理代碼。它遍歷圖像並計算圖像上的一些統計數據。下面可以看到我所談論的那種代碼的一個例子,雖然這是而不是需要優化的算法。如何描述/識別緊密處理循環中的緩慢步驟?
我的問題是,有哪些工具可用於分析這些緊密環路,以確定事情發展緩慢的地方?昏昏欲睡,Windows性能分析器更側重於識別哪些方法/功能較慢。我已經知道什麼功能很慢,我只需要弄清楚如何優化它。
void BGR2YUV(IplImage* bgrImg, IplImage* yuvImg)
{
const int height = bgrImg->height;
const int width = bgrImg->width;
const int step = bgrImg->widthStep;
const int channels = bgrImg->nChannels;
assert(channels == 3);
assert(bgrImg->height == yuvImg->height);
assert(bgrImg->width == yuvImg->width);
// for reasons that are not clear to me, these are not the same.
// Code below has been modified to reflect this fact, but if they
// could be the same, the code below gets sped up a bit.
// assert(bgrImg->widthStep == yuvImg->widthStep);
assert(bgrImg->nChannels == yuvImg->nChannels);
const uchar* bgr = (uchar*) bgrImg->imageData;
uchar* yuv = (uchar*) yuvImg->imageData;
for (int i = 0; i < height; i++)
{
for (int j = 0; j < width; j++)
{
const int ixBGR = i*step+j*channels;
const int b = (int) bgr[ixBGR+0];
const int g = (int) bgr[ixBGR+1];
const int r = (int) bgr[ixBGR+2];
const int y = (int) (0.299 * r + 0.587 * g + 0.114 * b);
const double di = 0.596 * r - 0.274 * g - 0.322 * b;
const double dq = 0.211 * r - 0.523 * g + 0.312 * b;
// Do some shifting and trimming to get i & q to fit into uchars.
const int iv = (int) (128 + max(-128.0, min(127.0, di)));
const int q = (int) (128 + max(-128.0, min(127.0, dq)));
const int ixYUV = i*yuvImg->widthStep + j*channels;
yuv[ixYUV+0] = (uchar)y;
yuv[ixYUV+1] = (uchar)iv;
yuv[ixYUV+2] = (uchar)q;
}
}
}
使用體面的採樣分析器。如果你堅持使用Windows,那麼最好的辦法就是英特爾的VTune。 –
如果您知道哪些功能需要大量時間,請將其拆開並仔細查看內部循環中生成的說明。 – TJD
Visual Studio沒有最好的分析器,但它的確不錯。 Google for'visual studio profiler'。或google your_favourite_os profiler – Sam