使用Atomic Builtins旋轉線程屏障

我正在嘗試使用原子，特別是__sync_fetch_and_add實現旋轉線程屏障。 https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/Atomic-Builtins.html 使用Atomic Builtins旋轉線程屏障

我基本上想要一個替代pthread障礙。我在可以並行運行大約100個線程的系統上使用Ubuntu。

int bar = 0;      //global variable 
int P = MAX_THREADS;    //number of threads 

__sync_fetch_and_add(&bar,1);  //each thread comes and adds atomically 
while(bar<P){}     //threads spin until bar increments to P 
bar=0;       //a thread sets bar=0 to be used in the next spinning barrier

這並不是出於顯而易見的原因工作（一個線程可以設置欄= 0，而另一個線程卡在無限while循環等）。我在這裏看到一個實現：使用C++ 11 atomics編寫一個（旋轉）線程屏障，但看起來太複雜了，我認爲它的性能可能比pthread屏障更差。

由於酒吧的高速緩存線在線程之間進行了ping操作，因此該實現還預計會在內存層次結構內產生更多流量。

有關如何使用這些原子指令來製造簡單障礙的任何想法？通信優化方案也會有所幫助。

來源

2015-11-08 masab

而是在紡線的計數器的，最好是在旋轉的的巴里數通過，將僅由最後一個線程遞增，面臨的障礙。這樣您也可以減少內存緩存壓力，因爲旋轉變量現在只能通過單線程來更新。

int P = MAX_THREADS; 
int bar = 0; // Counter of threads, faced barrier. 
volatile int passed = 0; // Number of barriers, passed by all threads. 

void barrier_wait() 
{ 
    int passed_old = passed; // Should be evaluated before incrementing *bar*! 

    if(__sync_fetch_and_add(&bar,1) == (P - 1)) 
    { 
     // The last thread, faced barrier. 
     bar = 0; 
     // *bar* should be reseted strictly before updating of barriers counter. 
     __sync_synchronize(); 
     passed++; // Mark barrier as passed. 
    } 
    else 
    { 
     // Not the last thread. Wait others. 
     while(passed == passed_old) {}; 
     // Need to synchronize cache with other threads, passed barrier. 
     __sync_synchronize(); 
    } 
}

注意，你需要使用volatile變質紡變量。

C++代碼可能是稍快比C之一，因爲它可以使用獲取/釋放存儲器障礙代替充分一個，這是唯一可用的屏障從__sync功能：

int P = MAX_THREADS; 
std::atomic<int> bar = 0; // Counter of threads, faced barrier. 
std::atomic<int> passed = 0; // Number of barriers, passed by all threads. 

void barrier_wait() 
{ 
    int passed_old = passed.load(std::memory_order_relaxed); 

    if(bar.fetch_add(1) == (P - 1)) 
    { 
     // The last thread, faced barrier. 
     bar = 0; 
     // Synchronize and store in one operation. 
     passed.store(passed_old + 1, std::memory_order_release); 
    } 
    else 
    { 
     // Not the last thread. Wait others. 
     while(passed.load(std::memory_order_relaxed) == passed_old) {}; 
     // Need to synchronize cache with other threads, passed barrier. 
     std::atomic_thread_fence(std::memory_order_acquire); 
    } 
}

來源

2015-11-09 12:47:53 Tsyvarev

好的謝謝，這工作得很好！ – masab

使用Atomic Builtins旋轉線程屏障

回答

相關問題