2017-02-15 90 views
1

是否可以選擇性地啓用帶有模板參數或運行時變量的openmp指令?在並行區域內選擇性啓用OpenMP for循環

this (all threads work on the same for loop). 
#pragma omp parallel 
{ 
    #pragma omp for 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 
versus this (each thread works on its own for loop) 
#pragma omp parallel 
{ 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 

更新(如果測試子句)

TEST.CPP:

#include <iostream> 
#include <omp.h> 

int main() { 
    bool var = true; 
    #pragma omp parallel 
    { 
     #pragma omp for if (var) 
     for (int i = 0; i < 4; ++i) { 
      std::cout << omp_get_thread_num() << "\n"; 
     } 
    } 
} 

錯誤消息(G ++ 6,使用g ++ TEST.CPP -fopenmp編譯)

test.cpp: In function ‘int main()’: 
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’ 
     #pragma omp for if (var) 
         ^~ 
+1

'#pragma omp parallel if(variable)' –

+0

這兩個版本都是並行的,大多數情況下我想選擇啓用'#pragma omp for line'。如果if子句可以和for子句一起工作,我會嘗試查找。謝謝。 –

+0

它確實。 https://msdn.microsoft.com/en-us/library/5187hzke.aspx希望對所有編譯器都是如此。 –

回答

0

工作分類。不知道是否有可能擺脫獲取線程ID的條件。

#include <iostream> 
#include <omp.h> 
#include <sstream> 
#include <vector> 
int main() { 
    constexpr bool var = true; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel if (var) 
    { 

     const int thread_id0 = omp_get_thread_num(); 
     #pragma omp parallel 
     { 
      int thread_id1; 
      if (var) { 
       thread_id1 = thread_id0; 
      } else { 
       thread_id1 = omp_get_thread_num(); 
      } 

      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id1] << i << ", "; 
      } 
     } 
    } 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 

輸出(當var == true):

n_threads: 8 
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7, 

輸出(當var == false):

n_threads: 8 
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7, 
+0

這適用於clang和g ++。不知道有關intel編譯器。 –

+0

如果啓用嵌套並行操作,它將無法按預期工作。 –

0
#include <omp.h> 
#include <sstream> 
#include <vector> 
#include <iostream> 
int main() { 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel 
    { 
     const int thread_id = omp_get_thread_num(); 
     if (var) { 
      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } 
     } else { 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } // code duplication 
     } 
    } 
    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 
+1

你意識到'else'塊中的代碼實際上創建了一個嵌套的並行區域,這可能會導致令人驚訝的結果?它可能看起來像OP一樣工作的唯一原因是默認情況下,嵌套並行是禁用的,並且該區域將在每個線程中以串行方式執行。 –

+0

謝謝。我通過刪除'else'塊中的'#pragma omp parallel for'來解決這個問題。 –

+0

對不起,我沒有意識到你是OP。你應該真的把你的答案結合成一個。 –

1

我認爲慣用C++的解決方案是隱藏不同OpenMP編譯後面算法重載。

#include <iostream> 
#include <sstream> 
#include <vector> 
#include <omp.h> 

#include <type_traits> 
template <bool ALL_PARALLEL> 
struct impl; 

template<> 
struct impl<true> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel 
    { 
     for (ITER i = begin; i != end; ++i) { 
     func(i); 
     } 
    } 
    } 
}; 

template<> 
struct impl<false> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel for 
    for (ITER i = begin; i < end; ++i) { 
     func(i); 
    } 
    } 
}; 

// This is just so we don't have to write parallel_foreach()(...) 
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE> 
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func) 
{ 
    impl<ALL_PARALLEL>()(begin, end, func); 
} 

int main() 
{ 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    parallel_foreach<var>(0, 8, [&s](auto i) { 
     s[omp_get_thread_num()] << i << ", "; 
    }); 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
} 

如果你使用一些特定的類型,可以按類型而不是使用bool模板參數做一個過載,並通過元素,而不是數字索引的循環迭代。請注意,您可以在OpenMP工作共享循環中使用C++隨機訪問迭代器!根據您的類型,您可能很好地實現了一個迭代器,它隱藏了調用者的內部數據訪問的所有內容。

+0

我認爲開銷對於迭代器來說相當大:http://stackoverflow.com/questions/2513988/iteration-through-std-containers-in-openmp不知道現在是否仍然如此。閱讀完之後,如果它用於openmp for循環,則避免爲類編寫迭代器。 –

+2

您誤讀了鏈接的答案。他給出的例子是'std :: set',它沒有隨機訪問迭代器。因此,他不使用循環工作共享結構('#pragma omp(parallel)for'),而是使用手工循環。如果在隨機訪問迭代器上使用普通的'#pragma omp for',則沒有固有開銷。你的優化里程可能會有所不同,所以測量和比較。 – Zulan

+0

謝謝。猜測我會在下一個項目中添加隨機訪問迭代器... –