2014-12-19 35 views
0

最近我已經閱讀了mikeash的blog,其中講述了dispatch_once的詳細實現。我也得到了它的源代碼macosforgedispatch_atomic_maximally_synchronizing_barrier();意思?

我瞭解大部分的代碼,除了這一行:

dispatch_atomic_maximally_synchronizing_barrier();

這是一個宏觀和定義:

#define dispatch_atomic_maximally_synchronizing_barrier() \ 
    do { unsigned long _clbr; __asm__ __volatile__(\ 
    "cpuid" \ 
    : "=a" (_clbr) : "0" (0) : "rbx", "rcx", "rdx", "cc", "memory" \ 
    ); } while(0) 

我知道是用來確保它「擊敗對等CPU的預測性預讀」,但我不知道cpuid及其後面的字眼。我對彙編語言知之甚少。

任何人都可以爲我詳細說明嗎?非常感謝。

+1

'CPUID'是P6和更高版本x86上的非特權序列化指令之一。請參閱[Intel 64和IA-32架構軟件開發人員手冊]第3卷第8節(「序列化指令」)(http://www.intel.com/content/www/us/en/processors/architectures-軟件開發人員manuals.html)。 – Michael 2014-12-19 09:18:46

回答

1

libdispatch源代碼幾乎可以解釋它。

http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/shims/atomic.h

// see comment in dispatch_once.c 
#define dispatch_atomic_maximally_synchronizing_barrier() \ 

http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/once.c

// The next barrier must be long and strong. 
// 
// The scenario: SMP systems with weakly ordered memory models 
// and aggressive out-of-order instruction execution. 
// 
// The problem: 
// 
// The dispatch_once*() wrapper macro causes the callee's 
// instruction stream to look like this (pseudo-RISC): 
// 
//  load r5, pred-addr 
//  cmpi r5, -1 
//  beq 1f 
//  call dispatch_once*() 
//  1f: 
//  load r6, data-addr 
// 
// May be re-ordered like so: 
// 
//  load r6, data-addr 
//  load r5, pred-addr 
//  cmpi r5, -1 
//  beq 1f 
//  call dispatch_once*() 
//  1f: 
// 
// Normally, a barrier on the read side is used to workaround 
// the weakly ordered memory model. But barriers are expensive 
// and we only need to synchronize once! After func(ctxt) 
// completes, the predicate will be marked as "done" and the 
// branch predictor will correctly skip the call to 
// dispatch_once*(). 
// 
// A far faster alternative solution: Defeat the speculative 
// read-ahead of peer CPUs. 
// 
// Modern architectures will throw away speculative results 
// once a branch mis-prediction occurs. Therefore, if we can 
// ensure that the predicate is not marked as being complete 
// until long after the last store by func(ctxt), then we have 
// defeated the read-ahead of peer CPUs. 
// 
// In other words, the last "store" by func(ctxt) must complete 
// and then N cycles must elapse before ~0l is stored to *val. 
// The value of N is whatever is sufficient to defeat the 
// read-ahead mechanism of peer CPUs. 
// 
// On some CPUs, the most fully synchronizing instruction might 
// need to be issued. 

dispatch_atomic_maximally_synchronizing_barrier(); 

對於64位x86和i386架構,它使用CPUID指令作爲@邁克爾提到沖洗指令流水線。 cpuid正在序列化指令以防止內存重新排序。和__sync_synchronize其他體系結構。

https://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Atomic-Builtins.html

__sync_synchronize (...) 
This builtin issues a full memory barrier. 

這些建宏被認爲是一個完整的屏障。也就是說,沒有內存操作數會在整個操作中向前或向後移動。此外,將根據需要發佈指令,以防止處理器在操作之後以及排隊存儲中推測負載。

+0

我已經閱讀過您提到的這些鏈接,但我仍然對這些句子「刷新指令流水線」和「序列化指令」感到困惑。在我看來,'dispatch_atomic_maximally_synchronizing_barrier();'就像是一個等待其他CPU投機失敗的函數。我對嗎?我知道'cpuid'是一個指令,但我想知道'cpuid'後的數據含義。 '「cpuid」\ :「= a」(_clbr):「0」(0):「rbx」,「rcx」,「rdx」,「cc」,「memory」\ ..... – KudoCC 2014-12-22 01:32:48

+0

I閱讀了文檔[GCC-Inline-Assembly-HOWTO](http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html),並認識到'cpuid'沒有做魔術,它只是獲取需要花費相當數量的時間才能執行的cpu信息,並使其他CPU在猜測時失敗。 – KudoCC 2014-12-22 02:42:24