每個線程的理想任務時間？

我想探討如何更快任務可以用多線程來完成相比，根據任務大小每個線程的理想任務時間？

單線程我繪製出圖表：

X_AXIS：任務有多快在單個線程上完成。 y軸：在兩個線程上完成相同的任務要快多少。

我會希望發生：

如果任務變得更長，創建線程的開銷變得不那麼重要了。因此比率（t_single/t_multi）增加
如我使用兩個線程我會expact比率（t_single/t_multi）收斂到2（兩個線程=>快兩倍，一個線程）

我能得到什麼：

在10E-2秒的單線程任務時的峯值
的峯值在2.5（多處理器比單線程快2.5倍）

這怎麼解釋？

該圖創建時平均超過10次測量。我在24核心Linux機器上運行它。

CODE：

#include <string> 
#include <iostream> 
#include <thread> 
#include <vector> 
#include <ctime> 
#include <math.h> 
#include <chrono> 

using namespace std; 
using namespace std::chrono; 

// function searches through vector and adds 1 
// to the first element that equals 0 
void task(int number) 
{ 
    int s = 0; 
    for(int i=0; i<number; i++){ 
     s = s + i; 
    } 
    // cout << "the sum is " << s << endl; 
} 

double get_time_single(int m){ 

    // init 
    int n_threads = 2; 
    int n = pow(10, m); 

    high_resolution_clock::time_point start = high_resolution_clock::now(); 

    for(int jobs = 0; jobs < n_threads; jobs++){ 
     task(n); 
    } 

    high_resolution_clock::time_point end = high_resolution_clock::now(); 
    double time_single = duration<double, std::milli>(end - start).count(); 

    return time_single; 
} 

double get_time_multi(int m){ 

    // init 
    int n_threads = 2; 
    int n = pow(10, m); 
    vector<thread> threads; 

    high_resolution_clock::time_point start = high_resolution_clock::now(); 

     // execute threads 
     for(int i = 1; i < n_threads + 1; i++){ 
      threads.push_back(thread(task, n)); 
     } 

     // joint threads 
     for(int i = 0; i < n_threads; i++){ 
      threads.at(i).join(); 
     } 

     high_resolution_clock::time_point end = high_resolution_clock::now(); 
     double time_multi = duration<double, std::milli>(end - start).count(); 

    return time_multi; 
} 


int main() 
{ 

    // print header of magnitude - multi-proc-time - single-proc-time table 
    cout << "mag" << "\t" << "time multi" << " \t" << "time single" << endl; 
    cout << "-------------------------------------" << endl; 

    // iterate through different task magnitudes 
    for(int m = 3; m<10; m++){ 

     double t_single = 0; 
     double t_multi = 0; 

     // get the mean over 10 runs 
     for(int i = 0; i < 10; i++){ 
      t_multi = t_multi + get_time_multi(m); 
      t_single = t_single + get_time_single(m); 
     } 

     t_multi = t_multi/10; 
     t_single = t_single/10; 

     cout << m << "\t" << t_multi << " \t" << t_single << endl; 

    } 
}

OUTPUT：

mag  time multi  time single 
------------------------------------- 
3  0.133946  0.0082684 
4  0.0666891  0.0393378 
5  0.30651   0.681517 
6  1.92084   5.19607 
7  18.8701   41.1431 
8  195.002   381.745 
9  1866.32   3606.08

來源

2017-06-11 Oli Blum

答案如下：不測量未優化構建的運行時間，因爲結果沒有意義。啓用優化後，您的'task'將編譯爲'void task（int）{}'。所以tl; dr：你正在測量廢話，舉一個更好的例子。 –

我禁用了g ++編譯器平面「-O0」的優化來禁用編譯器優化（如[建議] [https://stackoverflow.com/a/5765916/4391129]）。我仍然得到相同的結果。 –

您不應該在基準測試時禁用優化，結果完全沒有意義。沒有人關心調試版本的性能，包括實施該語言的人員。 –

所以，你得到的峯值性能MT當你的任務完成了5ms的？在Linux中，最大。時間片通常是6ms，可能與sysctl_sched_latency有關。

有關設置的更多信息。

當做微基準測試時，人們通常使用最快的值，而不是平均值。

由於CPU高速緩存（數據高速緩存和microops高速緩存），因此在C++中對這些外部循環進行編碼是一個壞主意。更好的做法是，在命令行參數中傳遞參數，編寫腳本多次調用您的應用程序，並在某處收集結果。

更新：一般來說，每個線程的理想任務時間是您在所有時間使用所有CPU核心並滿足其他要求（如延遲）時所能承受的最大時間。

來源

2017-06-11 16:29:36 Soonts

每個線程的理想任務時間？

回答

相關問題