2
我想探討如何更快任務可以用多線程來完成相比,根據任務大小每個線程的理想任務時間?
單線程我繪製出圖表:
- X_AXIS:任務有多快在單個線程上完成。 y軸:在兩個線程上完成相同的任務要快多少。
我會希望發生:
- 如果任務變得更長,創建線程的開銷變得不那麼重要了。因此比率(t_single/t_multi)增加
- 如我使用兩個線程我會expact比率(t_single/t_multi)收斂到2(兩個線程=>快兩倍,一個線程)
我能得到什麼:
- 在10E-2秒的單線程任務時的峯值
- 的峯值在2.5(多處理器比單線程快2.5倍)
這怎麼解釋?
該圖創建時平均超過10次測量。我在24核心Linux機器上運行它。
CODE:
#include <string>
#include <iostream>
#include <thread>
#include <vector>
#include <ctime>
#include <math.h>
#include <chrono>
using namespace std;
using namespace std::chrono;
// function searches through vector and adds 1
// to the first element that equals 0
void task(int number)
{
int s = 0;
for(int i=0; i<number; i++){
s = s + i;
}
// cout << "the sum is " << s << endl;
}
double get_time_single(int m){
// init
int n_threads = 2;
int n = pow(10, m);
high_resolution_clock::time_point start = high_resolution_clock::now();
for(int jobs = 0; jobs < n_threads; jobs++){
task(n);
}
high_resolution_clock::time_point end = high_resolution_clock::now();
double time_single = duration<double, std::milli>(end - start).count();
return time_single;
}
double get_time_multi(int m){
// init
int n_threads = 2;
int n = pow(10, m);
vector<thread> threads;
high_resolution_clock::time_point start = high_resolution_clock::now();
// execute threads
for(int i = 1; i < n_threads + 1; i++){
threads.push_back(thread(task, n));
}
// joint threads
for(int i = 0; i < n_threads; i++){
threads.at(i).join();
}
high_resolution_clock::time_point end = high_resolution_clock::now();
double time_multi = duration<double, std::milli>(end - start).count();
return time_multi;
}
int main()
{
// print header of magnitude - multi-proc-time - single-proc-time table
cout << "mag" << "\t" << "time multi" << " \t" << "time single" << endl;
cout << "-------------------------------------" << endl;
// iterate through different task magnitudes
for(int m = 3; m<10; m++){
double t_single = 0;
double t_multi = 0;
// get the mean over 10 runs
for(int i = 0; i < 10; i++){
t_multi = t_multi + get_time_multi(m);
t_single = t_single + get_time_single(m);
}
t_multi = t_multi/10;
t_single = t_single/10;
cout << m << "\t" << t_multi << " \t" << t_single << endl;
}
}
OUTPUT:
mag time multi time single
-------------------------------------
3 0.133946 0.0082684
4 0.0666891 0.0393378
5 0.30651 0.681517
6 1.92084 5.19607
7 18.8701 41.1431
8 195.002 381.745
9 1866.32 3606.08
答案如下:不測量未優化構建的運行時間,因爲結果沒有意義。啓用優化後,您的'task'將編譯爲'void task(int){}'。所以tl; dr:你正在測量廢話,舉一個更好的例子。 –
我禁用了g ++編譯器平面「-O0」的優化來禁用編譯器優化(如[建議] [https://stackoverflow.com/a/5765916/4391129])。我仍然得到相同的結果。 –
您不應該在基準測試時禁用優化,結果完全沒有意義。沒有人關心調試版本的性能,包括實施該語言的人員。 –