爲什麼GCC std :: atomic increment會生成低效的非原子組裝？

我一直在使用gcc的Intel兼容的內置命令（如__sync_fetch_and_add）很長一段時間，用自己的atomic模板。「__sync」功能現在正式被認爲是「遺留」。爲什麼GCC std :: atomic increment會生成低效的非原子組裝？

C++ 11支持std::atomic<>及其後代，所以它似乎是合理的使用，而不是，因爲它使我的代碼符合標準，編譯器會產生最好的代碼無論哪種方式，獨立於平臺的方式，也就是幾乎太好了，不可能是真的。
順便說一句，我只與std::atomic文本替換atomic了。 std::atomic（re：內存模型）中有很多我並不真正需要的，但是默認參數可以解決這個問題。

現在的壞消息。事實證明，從我所知道的情況來看，生成的代碼......完全是廢話，甚至根本不是原子。即使是增加一個單個原子變量並將其輸出具有最小的例子沒有少於5個非內聯函數調用以___atomic_flag_for_address，___atomic_flag_wait_explicit，和__atomic_flag_clear_explicit（完全優化），並且在另一方面，沒有在一個單一的原子指令生成可執

什麼給？當然總是有編譯器錯誤的可能性，但是對於大量的審閱者和用戶來說，這些相當激烈的事情通常不會被忽視。這意味着，這可能不是一個錯誤，而是預期的行爲。

什麼是「合理」的背後這麼多的函數調用，以及如何實現原子不原子？

AS-簡單-AS-它可任意得到例如：

#include <atomic> 

int main() 
{ 
    std::atomic_int a(5); 
    ++a; 
    __builtin_printf("%d", (int)a); 
    return 0; 
}

產生以下.s：

movl $5, 28(%esp)  #, a._M_i 
movl %eax, (%esp)  # tmp64, 
call ___atomic_flag_for_address # 
movl $5, 4(%esp) #, 
movl %eax, %ebx #, __g 
movl %eax, (%esp)  # __g, 
call ___atomic_flag_wait_explicit  # 
movl %ebx, (%esp)  # __g, 
addl $1, 28(%esp)  #, MEM[(__i_type *)&a] 
movl $5, 4(%esp) #, 
call _atomic_flag_clear_explicit # 
movl %ebx, (%esp)  # __g, 
movl $5, 4(%esp) #, 
call ___atomic_flag_wait_explicit  # 
movl 28(%esp), %esi # MEM[(const __i_type *)&a], __r 
movl %ebx, (%esp)  # __g, 
movl $5, 4(%esp) #, 
call _atomic_flag_clear_explicit # 
movl $LC0, (%esp)  #, 
movl %esi, 4(%esp) # __r, 
call _printf # 
(...) 
.def ___atomic_flag_for_address; .scl 2; .type 32; .endef 
.def ___atomic_flag_wait_explicit; .scl 2; .type 32; .endef 
.def _atomic_flag_clear_explicit; .scl 2; .type 32; .endef

...和所提到的函數看起來例如像這樣在objdump：

004013c4 <__atomic_flag_for_address>: 
mov 0x4(%esp),%edx 
mov %edx,%ecx 
shr $0x2,%ecx 
mov %edx,%eax 
shl $0x4,%eax 
add %ecx,%eax 
add %edx,%eax 
mov %eax,%ecx 
shr $0x7,%ecx 
mov %eax,%edx 
shl $0x5,%edx 
add %ecx,%edx 
add %edx,%eax 
mov %eax,%edx 
shr $0x11,%edx 
add %edx,%eax 
and $0xf,%eax 
add $0x405020,%eax 
ret

其餘的是有些簡單，但我沒有找到一個單一的指令，將真正是原子的（除一些虛假xchg其他哪些是的X86原子，但這些似乎相當於NOP /填充，因爲它是xchg %ax,%ax，跟在ret之後）。

我絕對不知道什麼是需要這樣一個相當複雜的功能，以及它是如何打算做什麼原子。

來源

2011-11-14 Damon

你使用的是什麼版本的GCC？你能否展示一個導致這種糟糕代碼的小程序？我正在運行上個月的4.7快照，它似乎產生了不錯的代碼，並帶有'lock'指令。 –

作爲一個可能的罪魁禍首，你會想到你「不需要」的記憶模型。你的代碼是什麼樣的？你最後一句話的含義是什麼：「原子性如何在沒有原子性的情況下實現」？ – jalf

「內存模型」，你的意思是「內存排序」？ –

這是一個編譯器生成不足。

檢查c++config.h，它shoukld看起來是這樣，但事實並非如此：

/* Define if builtin atomic operations for bool are supported on this host. */ 
#define _GLIBCXX_ATOMIC_BUILTINS_1 1 

/* Define if builtin atomic operations for short are supported on this host. 
    */ 
#define _GLIBCXX_ATOMIC_BUILTINS_2 1 

/* Define if builtin atomic operations for int are supported on this host. */ 
#define _GLIBCXX_ATOMIC_BUILTINS_4 1 

/* Define if builtin atomic operations for long long are supported on this 
    host. */ 
#define _GLIBCXX_ATOMIC_BUILTINS_8 1

這些宏被定義或不依賴於configure測試，檢查__sync_XXX功能的主機支持。這些測試在libstdc++v3/acinclude.m4，AC_DEFUN([GLIBCXX_ENABLE_ATOMIC_BUILTINS] ...。

您的安裝，它是從MEM[(__i_type *)&a]放在彙編文件通過-fverbose-asm編譯器使用宏從atomic_0.h明顯，例如：

#define _ATOMIC_LOAD_(__a, __x)      \ 
    ({typedef __typeof__(_ATOMIC_MEMBER_) __i_type;       \ 
    __i_type* __p = &_ATOMIC_MEMBER_;      \ 
    __atomic_flag_base* __g = __atomic_flag_for_address(__p);   \ 
    __atomic_flag_wait_explicit(__g, __x);     \ 
    __i_type __r = *__p;       \ 
    atomic_flag_clear_explicit(__g, __x);      \ 
    __r; })

利用正確內置的編譯器，你的示例程序，c++ -m32 -std=c++0x -S -O2 -march=core2 -fverbose-asm應該產生這樣的事情：

movl $5, 28(%esp) #, a.D.5442._M_i 
lock addl $1, 28(%esp) #, 
mfence 
movl 28(%esp), %eax # MEM[(const struct __atomic_base *)&a].D.5442._M_i, __ret 
mfence 
movl $.LC0, (%esp) #, 
movl %eax, 4(%esp) # __ret, 
call printf #

來源

2011-11-14 21:00:06 chill

你猜怎麼着，編輯'C++ config.h'遏制這些定義修復該問題，讓我完全'鎖ADDL，你上面貼，這是我想要的東西，也mfence'序列。（我會將問題轉發給我的編譯器生成器）。非常感謝你。 – Damon

有兩種實現方式。一個使用__sync基元，另一個不使用基元。加上兩個只使用一些原語的混合物。選擇哪一個取決於宏_GLIBCXX_ATOMIC_BUILTINS_1,_GLIBCXX_ATOMIC_BUILTINS_2，_GLIBCXX_ATOMIC_BUILTINS_4和_GLIBCXX_ATOMIC_BUILTINS_8。

至少需要用於混合執行的第一個，所有需要用於完全原子之一。它seems這是否它們被定義取決於目標機器上（他們可能不適合-mi386定義，並應爲-mi686定義）。

來源

2011-11-14 12:22:30

沒有在這裏被定義雖然原子的insn當然可以用'__sync'功能（我編譯爲'-march = core2'）和工作沒有問題。我試圖在包含''之前定義這些宏，只是爲了看看這是否有所作爲，儘管如此。所以基本上你說這可能是一種「窮人的後備實施」？在那種情況下，我將如何啓用真正的（不用編譯我自己的gcc）？ – Damon

爲什麼GCC std :: atomic increment會生成低效的非原子組裝？

回答

相關問題