在並行區域內選擇性啓用OpenMP for循環

是否可以選擇性地啓用帶有模板參數或運行時變量的openmp指令？在並行區域內選擇性啓用OpenMP for循環

this (all threads work on the same for loop). 
#pragma omp parallel 
{ 
    #pragma omp for 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
} 
versus this (each thread works on its own for loop) 
#pragma omp parallel 
{ 
    for (int i = 0; i < 10; ++i) { /*...*/ } 
}

更新（如果測試子句）

TEST.CPP：

#include <iostream> 
#include <omp.h> 

int main() { 
    bool var = true; 
    #pragma omp parallel 
    { 
     #pragma omp for if (var) 
     for (int i = 0; i < 4; ++i) { 
      std::cout << omp_get_thread_num() << "\n"; 
     } 
    } 
}

錯誤消息（G ++ 6，使用g ++ TEST.CPP -fopenmp編譯）

test.cpp: In function ‘int main()’: 
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’ 
     #pragma omp for if (var) 
         ^~

來源

2017-02-15 hamster on wheels

'#pragma omp parallel if（variable）' –

這兩個版本都是並行的，大多數情況下我想選擇啓用'#pragma omp for line'。如果if子句可以和for子句一起工作，我會嘗試查找。謝謝。 –

它確實。 https://msdn.microsoft.com/en-us/library/5187hzke.aspx希望對所有編譯器都是如此。 –

工作分類。不知道是否有可能擺脫獲取線程ID的條件。

#include <iostream> 
#include <omp.h> 
#include <sstream> 
#include <vector> 
int main() { 
    constexpr bool var = true; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel if (var) 
    { 

     const int thread_id0 = omp_get_thread_num(); 
     #pragma omp parallel 
     { 
      int thread_id1; 
      if (var) { 
       thread_id1 = thread_id0; 
      } else { 
       thread_id1 = omp_get_thread_num(); 
      } 

      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id1] << i << ", "; 
      } 
     } 
    } 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

輸出（當var == true）：

n_threads: 8 
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7,

輸出（當var == false）：

n_threads: 8 
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7,

來源

2017-02-15 17:14:50

這適用於clang和g ++。不知道有關intel編譯器。 –

如果啓用嵌套並行操作，它將無法按預期工作。 –

#include <omp.h> 
#include <sstream> 
#include <vector> 
#include <iostream> 
int main() { 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    #pragma omp parallel 
    { 
     const int thread_id = omp_get_thread_num(); 
     if (var) { 
      #pragma omp for 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } 
     } else { 
      for (int i = 0; i < 8; ++i) { 
       s[thread_id] << i << ", "; 
      } // code duplication 
     } 
    } 
    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

來源

2017-02-15 18:23:13

你意識到'else'塊中的代碼實際上創建了一個嵌套的並行區域，這可能會導致令人驚訝的結果？它可能看起來像OP一樣工作的唯一原因是默認情況下，嵌套並行是禁用的，並且該區域將在每個線程中以串行方式執行。 –

謝謝。我通過刪除'else'塊中的'#pragma omp parallel for'來解決這個問題。 –

對不起，我沒有意識到你是OP。你應該真的把你的答案結合成一個。 –

我認爲慣用C++的解決方案是隱藏不同OpenMP編譯後面算法重載。

#include <iostream> 
#include <sstream> 
#include <vector> 
#include <omp.h> 

#include <type_traits> 
template <bool ALL_PARALLEL> 
struct impl; 

template<> 
struct impl<true> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel 
    { 
     for (ITER i = begin; i != end; ++i) { 
     func(i); 
     } 
    } 
    } 
}; 

template<> 
struct impl<false> 
{ 
    template<typename ITER, typename CALLABLE> 
    void operator()(ITER begin, ITER end, const CALLABLE& func) { 
    #pragma omp parallel for 
    for (ITER i = begin; i < end; ++i) { 
     func(i); 
    } 
    } 
}; 

// This is just so we don't have to write parallel_foreach()(...) 
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE> 
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func) 
{ 
    impl<ALL_PARALLEL>()(begin, end, func); 
} 

int main() 
{ 
    constexpr bool var = false; 
    int n_threads = omp_get_num_procs(); 
    std::cout << "n_threads: " << n_threads << "\n"; 
    std::vector<std::stringstream> s(omp_get_num_procs()); 

    parallel_foreach<var>(0, 8, [&s](auto i) { 
     s[omp_get_thread_num()] << i << ", "; 
    }); 

    for (int i = 0; i < s.size(); ++i) { 
     std::cout << "thread " << i << ": " 
        << s[i].str() << "\n"; 
    } 
}

如果你使用一些特定的類型，可以按類型而不是使用bool模板參數做一個過載，並通過元素，而不是數字索引的循環迭代。請注意，您可以在OpenMP工作共享循環中使用C++隨機訪問迭代器！根據您的類型，您可能很好地實現了一個迭代器，它隱藏了調用者的內部數據訪問的所有內容。

來源

2017-02-15 19:19:57 Zulan

我認爲開銷對於迭代器來說相當大：http://stackoverflow.com/questions/2513988/iteration-through-std-containers-in-openmp不知道現在是否仍然如此。閱讀完之後，如果它用於openmp for循環，則避免爲類編寫迭代器。 –

您誤讀了鏈接的答案。他給出的例子是'std :: set'，它沒有隨機訪問迭代器。因此，他不使用循環工作共享結構（'#pragma omp（parallel）for'），而是使用手工循環。如果在隨機訪問迭代器上使用普通的'#pragma omp for'，則沒有固有開銷。你的優化里程可能會有所不同，所以測量和比較。 – Zulan

謝謝。猜測我會在下一個項目中添加隨機訪問迭代器... –

在並行區域內選擇性啓用OpenMP for循環

回答

相關問題