2014-04-01 53 views
0

我正在爲我的項目培訓一個用於閉環檢測的fabMap算法。培訓包括創建描述符,詞彙和Chow-Liu樹。我有一個數據庫超過10.000圖像。我正在使用一個相當不錯的桌面(12核心雙線程,32 GB內存和6 GB Nvidia顯卡),我想在培訓我的系統時充分利用它。我在Windows 7,64位系統上使用opencv 3.0,啓用了TBB。OpenCV中聚類方法的並行化

問題是隻有描述符的提取是多線程的。 Chow-Liu樹的聚類和構建是在單個線程中執行的。 BOWMSCTrainer類的cluster()方法有3個嵌套的for()循環,每個循環取決於前一個循環,甚至嵌套循環的大小也是動態分配的。這是cluster()方法的核心:

//_descriptors is a Matrix wherein each row is a descriptor 

Mat icovar = Mat::eye(_descriptors.cols,_descriptors.cols,_descriptors.type()); 

std::vector<Mat> initialCentres; 
initialCentres.push_back(_descriptors.row(0)); 
for (int i = 1; i < _descriptors.rows; i++) { 
    double minDist = DBL_MAX; 
    for (size_t j = 0; j < initialCentres.size(); j++) { 
     minDist = std::min(minDist, 
      cv::Mahalanobis(_descriptors.row(i),initialCentres[j], 
      icovar)); 
    } 
    if (minDist > clusterSize) 
     initialCentres.push_back(_descriptors.row(i)); 
} 

std::vector<std::list<cv::Mat> > clusters; 
clusters.resize(initialCentres.size()); 
for (int i = 0; i < _descriptors.rows; i++) { 
    int index = 0; double dist = 0, minDist = DBL_MAX; 
    for (size_t j = 0; j < initialCentres.size(); j++) { 
     dist = cv::Mahalanobis(_descriptors.row(i),initialCentres[j],icovar); 
     if (dist < minDist) { 
      minDist = dist; 
      index = (int)j; 
     } 
    } 
    clusters[index].push_back(_descriptors.row(i)); 
} 

// TODO: throw away small clusters. 

Mat vocabulary; 
Mat centre = Mat::zeros(1,_descriptors.cols,_descriptors.type()); 
for (size_t i = 0; i < clusters.size(); i++) { 
    centre.setTo(0); 
    for (std::list<cv::Mat>::iterator Ci = clusters[i].begin(); Ci != clusters[i].end(); Ci++) { 
     centre += *Ci; 
    } 
    centre /= (double)clusters[i].size(); 
    vocabulary.push_back(centre); 
} 

return vocabulary; 
} 

爲了看看培訓需要多長時間,我對數據庫進行了下采樣。我從10張圖片開始(~20,000個描述符),大約花了40分鐘。對於100張圖像(大約300,000個描述符)的樣本,整個過程花費了大約60個小時,而且我擔心1000個圖像(這會產生一個體面的詞彙表)可能需要8個月(!),(如果方法是O( n²) - > 60小時* 10 2〜8個月),我不想去想整個數據庫需要多長時間。

所以,我的問題是:是否有可能以某種方式並行執行cluster()方法,以便系統的培訓不會浪費大量時間?我想過應用openMP編譯指示,或者爲每個循環創建一個線程,但我認爲考慮到for()循環的動態性是不可能的。雖然我熟悉並行編程和多線程,但我並不是這方面的專家。

非常感謝提前!

回答

1

值得一提的是,我在此使用OpenCV的電話號碼parallel_for離開了我想出的代碼。我還在代碼中添加了一項功能,現在它刪除了小於閾值的所有羣集。該代碼有效地加快了該過程:

//The first nest of fors remains untouched, but the following ones: 

std::vector<std::list<cv::Mat> > clusters; 
clusters.resize(initialCentres.size()); 

Mutex lock = Mutex(); 
parallel_for_(cv::Range(0, _descriptors.rows - 1), 
     for_createClusters(clusters, initialCentres, icovar, _descriptors, lock)); 

Mat vocabulary; 
Mat centre = Mat::zeros(1,_descriptors.cols,_descriptors.type()); 
parallel_for_(cv::Range(0, clusters.size() - 1), for_estimateCentres(clusters, 
     vocabulary, centre, minSize, lock)); 

而且,在標題:

//parallel_for_ for creating clusters: 
class CV_EXPORTS for_createClusters: public ParallelLoopBody { 
private: 

std::vector<std::list<cv::Mat> >& bufferCluster; 
const std::vector<Mat> initCentres; 
const Mat icovar; 
const Mat descriptorsParallel; 
Mutex& lock_for; 

public: 
for_createClusters(std::vector<std::list<cv::Mat> >& _buffCl, 
     const std::vector<Mat> _initCentres, const Mat _icovar, 
     const Mat _descriptors, Mutex& _lock_for) 
: bufferCluster (_buffCl), initCentres(_initCentres), icovar(_icovar), 
    descriptorsParallel(_descriptors), lock_for(_lock_for){} 


virtual void operator()(const cv::Range &r) const 
{ 
    for (register int f = r.start; f != r.end; ++f) 
    { 
     int index = 0; double dist = 0, minDist = DBL_MAX; 
     for (register size_t j = 0; j < initCentres.size(); j++) { 
      dist = cv::Mahalanobis(descriptorsParallel.row(f), 
        initCentres[j],icovar); 
      if (dist < minDist) { 
       minDist = dist; 
       index = (int)j; 
      } 
     } 
     { 
//    AutoLock Lock(lock_for); 
      lock_for.lock(); 
      bufferCluster[index].push_back(descriptorsParallel.row(f)); 
      lock_for.unlock(); 
     } 
    } 
    } 
}; 

class CV_EXPORTS for_estimateCentres: public ParallelLoopBody { 
private: 

const std::vector<std::list<cv::Mat> > bufferCluster; 
Mat& vocabulary; 
const Mat centre; 
const int minSizCl; 
Mutex& lock_for; 

public: 
for_estimateCentres(const std::vector<std::list<cv::Mat> > _bufferCluster, 
     Mat& _vocabulary, const Mat _centre, const int _minSizCl, Mutex& _lock_for) 
: bufferCluster(_bufferCluster), vocabulary(_vocabulary), 
    centre(_centre), minSizCl(_minSizCl), lock_for(_lock_for){} 

virtual void operator()(const cv::Range &r) const 
{ 
    Mat ctr = Mat::zeros(1, centre.cols,centre.type()); 

    for (register int f = r.start; f != r.end; ++f){ 
     ctr.setTo(0); 
     //Not taking into account small clusters 
     if(bufferCluster[f].size() >= (size_t) minSizCl) 
     { 
      for (register std::list<cv::Mat>::const_iterator 
        Ci = bufferCluster[f].begin(); 
        Ci != bufferCluster[f].end(); Ci++) 
         ctr += *Ci; 

      ctr /= (double)bufferCluster[f].size(); 

      { 
//    AutoLock Lock(lock_for); 
       lock_for.lock(); 
       vocabulary.push_back(ctr); 
       lock_for.unlock(); 
      } 
     } 
    } 
    } 
}; 

希望這有助於有人...

+0

[Rü確保這是線程安全的?我嘗試了互斥技巧,它對我來說不起作用。矢量仍然遺漏了一些項目 –