Openmp基本並行化

我已經寫了一些使用OpenMP並行課程的並行c代碼。Openmp基本並行化

繼承人片段

#include <stdio.h> 
#include <time.h> 
#include <math.h> 

#define FALSE 0 
#define TRUE 1 

int count_primes_0(int); 
int count_primes_1(int); 
int count_primes_2(int); 

int main(int argc, char *argv[]){ 
    int n; 

    if (argc != 2){ 
     printf("Incorrect Invocation, use: \nq1 N"); 
     return 0; 
    } else { 
     n = atoi(argv[1]); 
    } 

    if (n < 0){ 
     printf("N cannot be negative"); 
     return 0; 
    } 

    printf("N = %d\n", n); 

    //omp_set_num_threads(1); 
    time_it(count_primes_0, n, "Method 0"); 
    time_it(count_primes_1, n, "Method 1"); 
    time_it(count_primes_2, n, "Method 2"); 

    return 0; 
} 

int is_prime(int n){ 
    for(int i = 2; i <= (int)(sqrt((double) n)); i++){ 
     if ((n % i) == 0){ 
      return FALSE; 
     } 
    } 

    return n > 1; 
} 

void time_it(int (*f)(int), int n, char *string){ 
    clock_t start_clock; 
    clock_t end_clock; 
    double calc_time; 
    int nprimes; 

    struct timeval start_val; 
    struct timeval end_val; 

    start_clock = clock(); 
    nprimes = (*f)(n); 
    end_clock = clock(); 
    calc_time = ((double)end_clock - (double)start_clock)/CLOCKS_PER_SEC; 
    printf("\tNumber of primes: %d \t Time taken: %fs\n\n", nprimes, calc_time); 
} 

// METHOD 0 
// Base Case no parallelization 
int count_primes_0(int n){ 
    int nprimes = 0; 

    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      nprimes++; 
     } 
    } 

    return nprimes; 
} 

//METHOD 1 
// Use only For and Critical Constructs 
int count_primes_1(int n){ 
    int nprimes = 0; 

    #pragma omp parallel for 
    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      #pragma omp critical 
      nprimes++; 
     } 
    } 

    return nprimes; 
} 

//METHOD 2 
// Use Reduction 
int count_primes_2(int n){ 
    int nprimes = 0; 

    #pragma omp parallel for reduction(+:nprimes) 
    for(int i = 1; i <= n; i++){ 
     if (is_prime(i)) { 
      nprimes++; 
     } 
    } 

    return nprimes; 
}

我現在面臨的問題是，當我使用OMP_SET_NUM_THREADS（）少的線程我用更快我的功能運行 - 或者更接近基地的運行時間並行化的情況下

時間結果：方法0：這些一個8芯機

8個線程上運行0.07s;方法1：1.63s;方法2：1.4s

4主題：方法0：0.07s;方法1：0.16s;方法2：0.16s

2主題：方法0：0.07s;方法1：0.10;方法2：0.09

1主題：方法0：0.07s;方法1：0.08s;方法2：0.07s

我已經試過禁用優化，並使用不同的gcc版本沒有區別

任何幫助表示讚賞。

編輯：在Linux中使用時鐘返回'不正確的'時間，掛鐘時間是我所需要的，所以使用ether omp_get_wtime（）或Linux函數timeit會產生正確的結果。

來源

2011-02-12 Ciaran Liedeman

您可以發佈不同NUM_THREADS的時序結果？ – CharlesB 2011-02-12 17:24:08

你在多核機器上運行嗎？這個代碼將受到CPU限制（與內存綁定或IO綁定相對），所以如果多線程能夠在問題上拋出更多內核，它將只會改進。 – 2011-02-12 17:29:46

你沒有長時間運行你的實驗，所以有可能你的OMP時代實際上是由產生和殺死線程所支配的。嘗試運行整個事情1000次，並計時整個事情。 – 2011-02-12 17:40:06

我很驚訝你已經看到上述程序的任何成功。如果您查看clock（）的RedHat Linux手冊頁，您會發現它「返回程序使用的處理器時間的近似值」。放入OpenMP指令會導致更多開銷，因此在運行OpenMP時應該會看到更多的總體處理器時間。你需要看的是過去的時間（或掛鐘時間）。當你並行運行（並且你有一個可以從並行中受益的程序）時，你會看到流逝的時間減少。 OpenMP規範定義了一個例程（omp_get_wtime（））來提供這些信息。

更改程序中使用的時鐘（）和omp_get_wtime（）報告：

$ a.out的1000000（1,000,000）

2處理器：

時鐘（）：0.23 wtime（）： 0.23時鐘（）：0.96 wtime（）：0.16時鐘（）：0.59 wtime（）：0.09

4個處理器：

時鐘（）：0.24 wtime（）：0.24時鐘（）：0.97 wtime（）：0.16時鐘（）：0.57 wtime（）：0.09

8個處理器：

時鐘（）：0.24 wtime（）：0.24時鐘（）：2.60 wtime（）：0.26時鐘（）：0.64 wtime（）：0 。09

$ a.out的10000000（10,000,000）

2處理器：

時鐘（）：6.07 wtime（）：6.07時鐘（）：10.4 wtime（）：1.78時鐘（）：11.3 wtime （）：1.65

4個處理器：

時鐘（）：6.07 wtime（）：6.07時鐘（）：11.5 wtime（）：1.71時鐘（）：10.7 wtime（）：1.72

8處理器：

時鐘（）：6.07 wtime（）：6.07時鐘（）：9.92 wtime（）：1.83時鐘（）：11.9 wtime（）：1.86

來源

2011-02-14 15:26:10 ejd

除非參數是私有的，否則OpenMP不會並行化其中包含函數調用的循環。解決方案是在循環中內聯is_prime()。

來源

2011-02-12 17:30:14 CharlesB

Openmp基本並行化

回答

相關問題