是否可以將C++中的std::inner_product()
與omp.h
庫並行化?不幸的是,我不能在新版本的gcc中使用__gnu_parallel::inner_product()
。我知道我可以實現我自己的inner_product
並將其並行化,但我想使用標準方法。std :: inner_product with omp
回答
簡答:沒有。
像inner_product
算法的整點是他們抽象循環遠離你。但爲了平行算法,您需要將該循環並行化 - 通過#pragma omp parallel for
或通過並行部分。這兩種方法都固有地與代碼結構中的循環鏈接,所以即使循環可以平行化(很可能),您需要將OpenMP編譯指令放在函數中,以便將並行性應用於它。
的Hristo的評論之後,你可以種分解過線陣列做到這一點,每個子陣呼籲inner_product
,然後使用某種還原操作的子結果
#include <iostream>
#include <numeric>
#include <omp.h>
#include <sys/time.h>
void tick(struct timeval *t);
double tock(struct timeval *t);
int main (int argc, char **argv) {
const long int nelements=1000000;
long int *a = new long int[nelements];
long int *b = new long int[nelements];
int nthreads;
long int sum = 0;
struct timeval t;
double time;
#pragma omp parallel for
for (long int i=0; i<nelements; i++) {
a[i] = i+1;
b[i] = 1;
}
tick(&t);
#pragma omp parallel
#pragma omp single
nthreads = omp_get_num_threads();
#pragma omp parallel default(none) reduction(+:sum) shared(a,b,nthreads)
{
int tid = omp_get_thread_num();
int nitems = nelements/nthreads;
int start = tid*nitems;
int end = start + nitems;
if (tid == nthreads-1) end = nelements;
sum += std::inner_product(&(a[start]), a+end, &(b[start]), 0L);
}
time = tock(&t);
std::cout << "using omp: sum = " << sum << " time = " << time << std::endl;
delete [] a;
delete [] b;
a = new long int[nelements];
b = new long int[nelements];
sum = 0;
for (long int i=0; i<nelements; i++) {
a[i] = i+1;
b[i] = 1;
}
tick(&t);
sum = std::inner_product(a, a+nelements, b, 0L);
time = tock(&t);
std::cout << "single threaded: sum = " << sum << " time = " << time << std::endl;
std::cout << "correct answer: sum = " << (nelements)*(nelements+1)/2 << std::endl ;
delete [] a;
delete [] b;
return 0;
}
void tick(struct timeval *t) {
gettimeofday(t, NULL);
}
/* returns time in seconds from now to time described by t */
double tock(struct timeval *t) {
struct timeval now;
gettimeofday(&now, NULL);
return (double)(now.tv_sec - t->tv_sec) + ((double)(now.tv_usec - t->tv_usec)/1000000.);
}
結合運行此得到更好的加速比我本來期望:
$ for NT in 1 2 4 8; do export OMP_NUM_THREADS=${NT}; echo; echo "NTHREADS=${NT}";./inner; done
NTHREADS=1
using omp: sum = 500000500000 time = 0.004675
single threaded: sum = 500000500000 time = 0.004765
correct answer: sum = 500000500000
NTHREADS=2
using omp: sum = 500000500000 time = 0.002317
single threaded: sum = 500000500000 time = 0.004773
correct answer: sum = 500000500000
NTHREADS=4
using omp: sum = 500000500000 time = 0.001205
single threaded: sum = 500000500000 time = 0.004758
correct answer: sum = 500000500000
NTHREADS=8
using omp: sum = 500000500000 time = 0.000617
single threaded: sum = 500000500000 time = 0.004784
correct answer: sum = 500000500000
運行兩個或多個併發內的產品......當然你可以寫'沒有那個inner_product'電話他們像t一樣很好地分解的屬性他的。 –
因爲您可能遇到動態團隊的良好OpenMP特性(給定適當的環境)並獲得'1',您可以在'single'構造中使用'omp_get_max_threads()'而不是'omp_get_num_threads'這個技巧。 –
- 1. 內聯的std :: inner_product
- 2. 替代std :: inner_product算法?
- 3. 使用std :: inner_product時的內積爲零
- 4. inner_product and complex vectors
- 5. std :: istreambuf_iterator「peek」with std :: ifstream
- 6. std :: map with std :: weak_ptr key
- 7. std :: conditional with SFINAE
- 8. std :: make_heap with pairs
- 9. 在我自己的課上使用std :: inner_product
- 10. boost :: assign with std :: map
- 11. 平行與OMP stucks
- 12. std :: map emplace failed with explicit constructor
- 13. SFINAE constexpr with std ::得到
- 14. TCLAP issus with parsing std :: wstring
- 15. std :: tr1 with visual studio 2017
- 16. omp並行塊之外的omp pragmas
- 17. pragma omp for inside pragma omp master or single
- 18. OMP並行與OMP並行的
- 19. omp減少和lambda函數
- 20. 關於Dijkstra omp
- 21. 暴露std :: vector <struct> with boost.python
- 22. C - Unreferenced Omp函數
- 23. C pragma omp並行
- 24. 瞭解#pragma omp parallel
- 25. 循環使用OMP
- 26. DirectX渲染和OMP
- 27. OMP並行減少
- 28. 如何鏈接C++`transform`和`inner_product`調用?
- 29. 正在調用inner_product(C++ STL)的自定義調用
- 30. #預編譯器插入的#pragma omp barrier?
您可以使用OpenMP :) –