我堅持一個問題,讓這個循環的迭代器在CUDA上工作。 任何人都可以在這裏幫忙嗎?opencv for循環與CUDA - 並行處理
std::vector<cv::DMatch> matches;
std::vector<cv::KeyPoint> key_pts1, key_pts2;
std::vector<cv::Point2f> points1, points2;
for (std::vector<cv::DMatch>::const_iterator itr = matches.begin(); itr!= matches.end(); ++it)
{
float x = key_pts1[itr->queryIdx].pt.x;
float y = key_pts1[itr->queryIdx].pt.y;
points1.push_back(cv::Point2f(x,y));
x = key_pts2[itr->trainIdx].pt.x;
y = key_pts2[itr->trainIdx].pt.y;
points2.push_back(cv::Point2f(x,y));
}
這上面轉化爲CUDA - 並行處理,因爲我曾經想過,似乎很難給我。
void dmatchLoopHomography(float *itr, float *match_being, float *match_end, float *keypoint_1, float *keypoint_2, float *pts1, float *pts2)
{
float x, y;
// allocate memory in GPU memory
unsigned char *mtch_begin, *mtch_end, *keypt_1, *keypt_2, points1, *points2;
cudaHostGetDevicePointer(&mtch_begin, match_being, 0);
cudaHostGetDevicePointer(&mtch_end, match_end, 0);
cudaHostGetDevicePointer(&keypt_1, keypoint_1, 0);
cudaHostGetDevicePointer(&keypt_2, keypoint_2, 0);
cudaHostGetDevicePointer(&points1, pts1, 0);
cudaHostGetDevicePointer(&points2, pts2, 0);
//dim3 blocks(16, 16);
dim3 threads(itr, itr);
//kernal
dmatchLoopHomography_ker<<<itr,itr>>>(mtch_begin, mtch_end, keypt_1, keypt_2, points1. points2)
cudaThreadSynchronize();
}
和
__global__ void dmatchLoopHomography_ker(float *itr, float *match_being, float *match_end, float *keypoint_1, float *keypoint_2, float *pts1, float *pts2)
{
//how do I go about it ??
}
謝謝@solvingPuzzles爲您的答案和意見..是的,我同意你的觀點,特徵匹配具有更高的計算需求量。我的問題的動機是理解如何解決這個簡單的問題,後來我可以通過我的時間學習GPU獲取的信息來獲取GPU上的功能匹配。感謝您與Dr.Dobbs的鏈接..是的,我一直在工作中,以及..順便說一句感謝關鍵點::轉換,力德之前知道它... – Mahesh
@timothy很好!作爲一個玩具的例子,我建議在CUDA中編碼矩陣乘法。隨意「欺騙」並查看示例代碼。這段代碼不會非常冗長,但是你會學習CUDA的鍋爐板材(像threadIdx,blockIdx,cudaMemcpy,grid,block等等) – solvingPuzzles