我編寫簡單的C++代碼來計算數組減少總和,但OpenMP減少程序的運行速度很慢。程序有兩種變體:一種是最簡單的和,另一種是複雜數學函數的和。在代碼中複雜的變體被評論。OpenMP緩慢減少
icpc reduction.cpp -openmp -o reduction -O3
g++ reduction.cpp -fopenmp -o reduction -O3
處理器::英特爾酷睿2 T5850,OS:
#include <iostream>
#include <omp.h>
#include <math.h>
using namespace std;
#define N 100000000
#define NUM_THREADS 4
int main() {
int *arr = new int[N];
for (int i = 0; i < N; i++) {
arr[i] = i;
}
omp_set_num_threads(NUM_THREADS);
cout << NUM_THREADS << endl;
clock_t start = clock();
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
// sum += sqrt(sqrt(arr[i] * arr[i])); // complex variant
sum += arr[i]; // simple variant
}
double diff = (clock() - start)/(double)CLOCKS_PER_SEC;
cout << "Time " << diff << "s" << endl;
cout << sum << endl;
delete[] arr;
return 0;
}
我通過ICPC和GCC編譯它的Ubuntu 10.10
有很多簡單和複雜的變體,與編譯的執行時間沒有OpenMP。
簡單變體 「之和+ = ARR [I];」:
icpc
0.1s without OpenMP
0.18s with OpenMP
g++
0.11c without OpenMP
0.17c with OpenMP
複雜變體「之和+ = SQRT(SQRT(ARR [I] *的常用3 [1])); 「:
icpc
2,92s without OpenMP
3,37s with OpenMP
g++
47,97s without OpenMP
48,2s with OpenMP
在系統監視器中,我看到2個內核在OpenMP程序中工作,1個內核在沒有OpenMP的程序中工作。我會在OpenMP中嘗試幾個線程,並且不加速。我不明白爲什麼減速很慢。
對於簡單的版本,你得到約2倍的加速,並且你有2個核心! – 2011-06-08 16:04:10
對不起,我混淆了和沒有OpenMP。但我的問題是正確的。 – 2011-06-08 16:12:46