2014-01-23 25 views
2

我寫了一個小測試來檢查不同內存順序的原子負載的性能差異,我發現性能對於放寬和順序一致的內存排序是相同的。這是否僅僅是由於次優化的編譯器實現所致,或者這是我在x86處理器上可以預期的結果?我使用編譯器gcc(GCC)4.4.7 20120313(Red Hat 4.4.7-3)。我編譯我的測試與優化-O2(這就是爲什麼第二次測試與簡單變量顯示零執行時間)。原子內存訂購性能差異

Results: 
Start volatile tests with 1000000000 iterations 
volatile test took 689438 microseconds. Last value of local var is 1 
Start simple var tests with 1000000000 iterations 
simple var test took 0 microseconds. Last value of local var is 2 
Start relaxed atomic tests with 1000000000 iterations 
relaxed atomic test took 25655002 microseconds. Last value of local var is 3 
Start sequentially consistent atomic tests with 1000000000 iterations 
sequentially consistent atomic test took 24844000 microseconds. Last value of local var is 4 

這是測試功能:

std::atomic<int> atomic_var; 
void relaxed_atomic_test(const unsigned iterations) 
{ 
    cout << "Start relaxed atomic tests with " << iterations << " iterations" << endl; 
    const microseconds start(std::chrono::system_clock::now().time_since_epoch()); 
    int local_var = 0; 
    for(unsigned counter = 0; iterations != counter; ++counter) 
    { 
     local_var = atomic_var.load(memory_order_relaxed); 
    } 
    const microseconds end(std::chrono::system_clock::now().time_since_epoch()); 
    cout << "relaxed atomic test took " << (end - start).count() 
     << " microseconds. Last value of local var is " << local_var << endl; 
} 

void sequentially_consistent_atomic_test(const unsigned iterations) 
{ 
    cout << "Start sequentially consistent atomic tests with " 
     << iterations << " iterations" << endl; 
    const microseconds start(std::chrono::system_clock::now().time_since_epoch()); 
    int local_var = 0; 
    for(unsigned counter = 0; iterations != counter; ++counter) 
    { 
     local_var = atomic_var.load(memory_order_seq_cst); 
    } 
    const microseconds end(std::chrono::system_clock::now().time_since_epoch()); 
    cout << "sequentially consistent atomic test took " << (end - start).count() 
     << " microseconds. Last value of local var is " << local_var << endl; 
} 

UPDATE: 我嘗試了同樣的測試,而是讀我用寫成原子變量。結果卻大相徑庭 - 寫入memory_order_relaxed原子採取了同樣的時間寫成揮發性:

Start volatile tests with 1000000000 iterations 
volatile test took 764088 microseconds. Last volatile_var value 999999999 
Start simple var tests with 1000000000 iterations 
simple var test took 0 microseconds. Last var value999999999 
Start relaxed atomic tests with 1000000000 iterations 
relaxed atomic test took 763968 microseconds. Last atomic_var value 999999999 
Start sequentially consistent atomic tests with 1000000000 iterations 
sequentially consistent atomic test took 15287267 microseconds. Last atomic_var value 999999999 

因此我可以斷定,在單個線程原子與鬆散內存排序表現爲易揮發的存儲操作和原子與順序一致內存訂購負載操作(使用此處理器和編譯器)

回答

2

x86是一個相對嚴格的內存方面的架構,所以你很可能會看到兩者之間的相似性能。你會看到架構上的更大差異,允許更多像POWER一樣的重新排序。