libdispatch源代碼幾乎可以解釋它。
http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/shims/atomic.h
// see comment in dispatch_once.c
#define dispatch_atomic_maximally_synchronizing_barrier() \
http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/once.c
// The next barrier must be long and strong.
//
// The scenario: SMP systems with weakly ordered memory models
// and aggressive out-of-order instruction execution.
//
// The problem:
//
// The dispatch_once*() wrapper macro causes the callee's
// instruction stream to look like this (pseudo-RISC):
//
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
// load r6, data-addr
//
// May be re-ordered like so:
//
// load r6, data-addr
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
//
// Normally, a barrier on the read side is used to workaround
// the weakly ordered memory model. But barriers are expensive
// and we only need to synchronize once! After func(ctxt)
// completes, the predicate will be marked as "done" and the
// branch predictor will correctly skip the call to
// dispatch_once*().
//
// A far faster alternative solution: Defeat the speculative
// read-ahead of peer CPUs.
//
// Modern architectures will throw away speculative results
// once a branch mis-prediction occurs. Therefore, if we can
// ensure that the predicate is not marked as being complete
// until long after the last store by func(ctxt), then we have
// defeated the read-ahead of peer CPUs.
//
// In other words, the last "store" by func(ctxt) must complete
// and then N cycles must elapse before ~0l is stored to *val.
// The value of N is whatever is sufficient to defeat the
// read-ahead mechanism of peer CPUs.
//
// On some CPUs, the most fully synchronizing instruction might
// need to be issued.
dispatch_atomic_maximally_synchronizing_barrier();
對於64位x86和i386架構,它使用CPUID指令作爲@邁克爾提到沖洗指令流水線。 cpuid正在序列化指令以防止內存重新排序。和__sync_synchronize其他體系結構。
https://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Atomic-Builtins.html
__sync_synchronize (...)
This builtin issues a full memory barrier.
這些建宏被認爲是一個完整的屏障。也就是說,沒有內存操作數會在整個操作中向前或向後移動。此外,將根據需要發佈指令,以防止處理器在操作之後以及排隊存儲中推測負載。
'CPUID'是P6和更高版本x86上的非特權序列化指令之一。請參閱[Intel 64和IA-32架構軟件開發人員手冊]第3卷第8節(「序列化指令」)(http://www.intel.com/content/www/us/en/processors/architectures-軟件開發人員manuals.html)。 – Michael 2014-12-19 09:18:46