OpenMP迭代並行區域循環

對不起，如果標題不清楚。我不知道該怎麼說。OpenMP迭代並行區域循環

我不知道是否有什麼辦法可以做到以下幾點：

#pragma omp parallel 
{ 
    for (int i = 0; i < iterations; i++) { 
     #pragma omp for 
     for (int j = 0; j < N; j++) 
      // Do something 
    } 
}

忽略的東西，如在省略私人符for循環，是沒有辦法，我可以派生我的外表外螺紋任何方式循環，以便我可以並行內部循環？從我的理解（如果我錯了，請糾正我），所有線程都會執行外部循環。我不確定內部循環的行爲，但我認爲for會將塊分發給每個遇到它的線程。

我想要做的是不必叉/加入iterations次，但只是在外環做一次什麼。這是否是正確的策略？

如果有什麼是另一個外部循環不應該並行？這是...

#pragma omp parallel 
{ 

    for (int i = 0; i < iterations; i++) { 
     for(int k = 0; k < innerIterations; k++) { 
      #pragma omp for 
      for (int j = 0; j < N; j++) 
       // Do something 

      // Do something else 
     } 
    } 
}

這將會是巨大的，如果有人點我到一個大的應用程序的示例使用OpenMP，這樣我可以更好地瞭解策略使用OpenMP時可以採用並行。我似乎無法找到任何。

澄清：我正在尋找不改變循環排序或涉及阻塞，緩存和一般性能考慮的解決方案。我想了解如何在指定的循環結構中的OpenMP中完成此操作。 // Do something可能有也可能沒有依賴關係，假設他們這樣做，並且你不能移動。

來源

2013-05-08 Pochi

也許你可以舉一個你想做什麼的例子。我的意思是填寫代碼//做些什麼 – 2013-05-08 15:36:54

@raxman，這沒有幫助。這意味着要求解決這個問題的一般解決方案，而不是針對特定應用的解決方案。 – Pochi 2013-05-08 16:05:18

你可以繼續並upvote /接受一些答案。似乎人們付出了一些努力，並得到了所有的最小upvotes。 – 2015-10-14 15:28:47

我不知道我能回答你的問題。我現在只用了幾個月的OpenMP，但是當我嘗試回答這樣的問題時，我會進行一些hello world printf測試，如下所示。我認爲這可能有助於回答你的問題。也試試#pragma omp for nowait，看看會發生什麼。

只要確保當你「//做一些事情，//做別的事情」，你不寫同一個內存地址，並創建一個競爭狀態。另外，如果您正在進行大量的閱讀和寫作，則需要考慮如何有效地使用緩存。

#include "stdio.h" 
#include <omp.h> 
void loop(const int iterations, const int N) { 
    #pragma omp parallel 
    { 
     int start_thread = omp_get_thread_num(); 
     printf("start thread %d\n", start_thread); 
     for (int i = 0; i < iterations; i++) { 
      printf("\titeration %d, thread num %d\n", i, omp_get_thread_num()); 
      #pragma omp for 
      for (int j = 0; j < N; j++) { 
       printf("\t\t inner loop %d, thread num %d\n", j, omp_get_thread_num()); 
      } 
     } 
    } 
} 

int main() { 
    loop(2,30); 
}

就性能而言，您可能需要考慮像這樣來融合循環。

#pragma omp for 
for(int n=0; n<iterations*N; n++) { 
    int i = n/N; 
    int j = n%N;  
    //do something as function of index i and j 
}

來源

2013-05-08 10:12:09

這很難回答，因爲它確實取決於代碼中的依賴關係。但要解決這個問題的一般方法是顛倒循環的嵌套，就像這樣：

#pragma omp parallel 
{ 
    #pragma omp for 
    for (int j = 0; j < N; j++) { 
     for (int i = 0; i < iterations; i++) { 
      // Do something 
     } 
    } 
}

關當然，這可以或不可以是可能的，這取決於什麼是循環中的代碼。您處理

來源

2013-05-08 13:14:06

方式的兩個for循環看起來我的權利，因爲它達到你想要的行爲的意義：外環不併行，而內循環。

爲了更好地澄清發生的事情，我會嘗試一些筆記添加到您的代碼：

#pragma omp parallel 
{ 
    // Here you have a certain number of threads, let's say M 
    for (int i = 0; i < iterations; i++) { 
     // Each thread enters this region and executes all the iterations 
     // from i = 0 to i < iterations. Note that i is a private variable. 
     #pragma omp for 
     for (int j = 0; j < N; j++) { 
      // What happens here is shared among threads so, 
      // according to the scheduling you choose, each thread 
      // will execute a particular portion of your N iterations 
     } // IMPLICIT BARRIER    
    } 
}

的隱性障礙是同步的，其中線程等待對方的一個點。作爲一個一般的經驗法則因此優選並行外環而不是內部循環，因爲這會創建同步的用於iterations*N迭代（而不是iterations點，你在上面創建）的單個點。

來源

2013-05-08 19:19:29 Massimiliano

外層循環應該指定某個算法的多次傳遞，所以它不能被並行化。對不起，如果我不清楚。 – Pochi 2013-05-08 23:18:30

外循環沒有並行化，因爲沒有工作共享指令 – Massimiliano 2013-05-09 06:50:04

如果你使用我建議的代碼運行「hello world printf」測試，它會顯示所有這些。你可以看到，如果你添加nowait標籤，屏障被刪除。換句話說，沒有nowait外部循環不是並行化的，而且它是。 – 2013-05-09 07:14:23

OpenMP迭代並行區域循環

回答

相關問題