2012-12-07 83 views
2

是否可以將C++中的std::inner_product()omp.h庫並行化?不幸的是,我不能在新版本的gcc中使用__gnu_parallel::inner_product()。我知道我可以實現我自己的inner_product並將其並行化,但我想使用標準方法。std :: inner_product with omp

+1

您可以使用OpenMP :) –

回答

2

簡答:沒有。

inner_product算法的整點是他們抽象循環遠離你。但爲了平行算法,您需要將該循環並行化 - 通過#pragma omp parallel for或通過並行部分。這兩種方法都固有地與代碼結構中的循環鏈接,所以即使循環可以平行化(很可能),您需要將OpenMP編譯指令放在函數中,以便將並行性應用於它。

2

的Hristo的評論之後,你可以種分解過線陣列做到這一點,每個子陣呼籲inner_product,然後使用某種還原操作的子結果

#include <iostream> 
#include <numeric> 
#include <omp.h> 

#include <sys/time.h> 
void tick(struct timeval *t); 
double tock(struct timeval *t); 

int main (int argc, char **argv) { 
    const long int nelements=1000000; 
    long int *a = new long int[nelements]; 
    long int *b = new long int[nelements]; 
    int nthreads; 
    long int sum = 0; 
    struct timeval t; 
    double time; 

    #pragma omp parallel for 
    for (long int i=0; i<nelements; i++) { 
     a[i] = i+1; 
     b[i] = 1; 
    } 

    tick(&t); 
    #pragma omp parallel 
    #pragma omp single 
    nthreads = omp_get_num_threads(); 

    #pragma omp parallel default(none) reduction(+:sum) shared(a,b,nthreads) 
    { 
     int tid = omp_get_thread_num(); 
     int nitems = nelements/nthreads; 
     int start = tid*nitems; 
     int end = start + nitems; 
     if (tid == nthreads-1) end = nelements; 

     sum += std::inner_product(&(a[start]), a+end, &(b[start]), 0L); 
    } 
    time = tock(&t); 

    std::cout << "using omp: sum = " << sum << " time = " << time << std::endl; 

    delete [] a; 
    delete [] b; 



    a = new long int[nelements]; 
    b = new long int[nelements]; 
    sum = 0; 

    for (long int i=0; i<nelements; i++) { 
     a[i] = i+1; 
     b[i] = 1; 
    } 
    tick(&t); 
    sum = std::inner_product(a, a+nelements, b, 0L); 
    time = tock(&t); 

    std::cout << "single threaded: sum = " << sum << " time = " << time << std::endl; 

    std::cout << "correct answer: sum = " << (nelements)*(nelements+1)/2 << std::endl ; 

    delete [] a; 
    delete [] b; 

    return 0; 
} 

void tick(struct timeval *t) { 
    gettimeofday(t, NULL); 
} 

/* returns time in seconds from now to time described by t */ 
double tock(struct timeval *t) { 
    struct timeval now; 
    gettimeofday(&now, NULL); 
    return (double)(now.tv_sec - t->tv_sec) + ((double)(now.tv_usec - t->tv_usec)/1000000.); 
} 

結合運行此得到更好的加速比我本來期望:

$ for NT in 1 2 4 8; do export OMP_NUM_THREADS=${NT}; echo; echo "NTHREADS=${NT}";./inner; done 

NTHREADS=1 
using omp: sum = 500000500000 time = 0.004675 
single threaded: sum = 500000500000 time = 0.004765 
correct answer: sum = 500000500000 

NTHREADS=2 
using omp: sum = 500000500000 time = 0.002317 
single threaded: sum = 500000500000 time = 0.004773 
correct answer: sum = 500000500000 

NTHREADS=4 
using omp: sum = 500000500000 time = 0.001205 
single threaded: sum = 500000500000 time = 0.004758 
correct answer: sum = 500000500000 

NTHREADS=8 
using omp: sum = 500000500000 time = 0.000617 
single threaded: sum = 500000500000 time = 0.004784 
correct answer: sum = 500000500000 
+0

運行兩個或多個併發內的產品......當然你可以寫'沒有那個inner_product'電話他們像t一樣很好地分解的屬性他的。 –

+0

因爲您可能遇到動態團隊的良好OpenMP特性(給定適當的環境)並獲得'1',您可以在'single'構造中使用'omp_get_max_threads()'而不是'omp_get_num_threads'這個技巧。 –