在瞭解了shared
variables are currently not guarded by memory barriers的難題之後,我現在遇到了另一個問題。無論是我做錯了什麼,或者dmd中的現有編譯器優化都可以通過重新排序shared
變量的讀取來打破多線程代碼。編譯器優化打破多線程代碼
作爲一個例子,當我編譯dmd -O
(完全優化)可執行文件,編譯器愉快地優化遠局部變量o
在此代碼(其中cas
是從core.atomic
的比較並交換功能)
shared uint cnt;
void atomicInc () { uint o; do { o = cnt; } while (!cas(&cnt, o, o + 1));}
到這樣的事情(見下文的拆卸):
shared uint cnt;
void atomicInc () { while (!cas(&cnt, cnt, cnt + 1)) { } }
在「優化的」代碼cnt
從存儲器中讀取兩次,從而冒着另一個線程修改cnt
的風險。優化基本上破壞了比較和交換算法。
這是一個錯誤,還是有正確的方法來實現預期的結果?到目前爲止我發現的唯一解決方法是使用匯編程序來實現代碼。
全部測試代碼和其他詳細信息
爲了完整起見,這裏是一個完整的測試代碼,顯示這兩個問題(沒有記憶障礙,優化問題)。它產生於三個不同的Windows機器下面的輸出兩個DMD 2.049和DMD 2.050(假設德克爾的算法不死鎖,這可能發生):
dmd -O -run optbug.d
CAS : failed
Dekker: failed
而且裏面atomicInc
循環被編譯到這個全優化:
; cnt is stored at 447C10h
; while (!cas(&cnt, o, o + 1)) o = cnt;
; 1) prepare call cas(&cnt, o, o + 1): &cnt and o go to stack, o+1 to eax
402027: mov ecx,447C10h ; ecx = &cnt
40202C: mov eax,[447C10h] ; eax = o1 = cnt
402031: inc eax ; eax = o1 + 1 (third parameter)
402032: push ecx ; push &cnt (first parameter)
; next instruction pushes current value of cnt onto stack
; as second parameter o instead of re-using o1
402033: push [447C10h]
402039: call 4020BC ; 2) call cas
40203E: xor al,1 ; 3) test success
402040: jne 402027 ; no success try again
; end of main loop
下面是測試代碼:
import core.atomic;
import core.thread;
import std.stdio;
enum loops = 0xFFFF;
shared uint cnt;
/* *****************************************************************************
Implement atomicOp!("+=")(cnt, 1U); with CAS. The code below doesn't work with
the "-O" compiler flag because cnt is read twice while calling cas and another
thread can modify cnt in between.
*/
enum threads = 8;
void atomicInc () { uint o; do { o = cnt; } while (!cas(&cnt, o, o + 1));}
void threadFunc () { foreach (i; 0..loops) atomicInc; }
void testCas () {
cnt = 0;
auto tgCas = new ThreadGroup;
foreach (i; 0..threads) tgCas.create(&threadFunc);
tgCas.joinAll;
writeln("CAS : ", cnt == loops * threads ? "passed" : "failed");
}
/* *****************************************************************************
Dekker's algorithm. Fails on ia32 (other than atom) because ia32 can re-order
read before write. Most likely fails on many other architectures.
*/
shared bool flag1 = false;
shared bool flag2 = false;
shared bool turn2 = false; // avoids starvation by executing 1 and 2 in turns
void dekkerInc () {
flag1 = true;
while (flag2) if (turn2) {
flag1 = false; while (turn2) { /* wait until my turn */ }
flag1 = true;
}
cnt++; // shouldn't work without a cast
turn2 = true; flag1 = false;
}
void dekkerDec () {
flag2 = true;
while (flag1) if (!turn2) {
flag2 = false; while (!turn2) { /* wait until my turn */ }
flag2 = true;
}
cnt--; // shouldn't work without a cast
turn2 = false; flag2 = false;
}
void threadDekkerInc () { foreach (i; 0..loops) dekkerInc; }
void threadDekkerDec () { foreach (i; 0..loops) dekkerDec; }
void testDekker () {
cnt = 0;
auto tgDekker = new ThreadGroup;
tgDekker.create(&threadDekkerInc);
tgDekker.create(&threadDekkerDec);
tgDekker.joinAll;
writeln("Dekker: ", cnt == 0 ? "passed" : "failed");
}
/* ************************************************************************** */
void main() {
testCas;
testDekker;
}
如果這是已知問題或報告錯誤(http://d.puremagic.com/),您應該向digitalmars.D新聞組(http://www.digitalmars.com/NewsGroup.html)詢問。問題/)。 – 2010-11-13 08:48:44
@Michal:我剛剛看到你已經在那裏問過(http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=26308)。謝謝! – stephan 2010-11-14 09:13:25
這是否已添加到bugzilla? – Trass3r 2012-03-12 20:42:01