2012-08-05 46 views
1

我已經實現了使用GLSL自旋鎖的深度剝離算法(受this的啓發)。在下面的可視化中,請注意深度剝離算法的正確運行方式(第一層左上,第二層右上,第三層左下,第四層右下)。四個深度圖層存儲在一個RGBA紋理中。GLSL SpinLock only Mostly Works

不幸的是,自旋鎖有時不能防止錯誤 - 你可以看到很少的白色斑點,特別是在第四層。第二層的太空船也有一個。這些斑點每幀都有所不同。

enter image description here

以我GLSL自旋鎖,當一個片段是要繪製,所述片段程序讀取和原子寫鎖定值到一個單獨的鎖定紋理,等待直到一個0出現時,指示該鎖打開。 In practice,我發現程序必須是並行的,因爲如果兩個線程在同一像素上,則warp不能繼續(一個必須等​​待,另一個線程繼續,並且GPU線程扭曲中的所有線程必須同時執行)。

我的片斷程序看起來像這樣(註釋和補充間距):

#version 420 core 

//locking texture 
layout(r32ui) coherent uniform uimage2D img2D_0; 
//data texture, also render target 
layout(RGBA32F) coherent uniform image2D img2D_1; 

//Inserts "new_data" into "data", a sorted list 
vec4 insert(vec4 data, float new_data) { 
    if  (new_data<data.x) return vec4(  new_data,data.xyz); 
    else if (new_data<data.y) return vec4(data.x,new_data,data.yz); 
    else if (new_data<data.z) return vec4(data.xy,new_data,data.z); 
    else if (new_data<data.w) return vec4(data.xyz,new_data  ); 
    else      return data; 
} 

void main() { 
    ivec2 coord = ivec2(gl_FragCoord.xy); 

    //The idea here is to keep looping over a pixel until a value is written. 
    //By looping over the entire logic, threads in the same warp aren't stalled 
    //by other waiting threads. The first imageAtomicExchange call sets the 
    //locking value to 1. If the locking value was already 1, then someone 
    //else has the lock, and can_write is false. If the locking value was 0, 
    //then the lock is free, and can_write is true. The depth is then read, 
    //the new value inserted, but only written if can_write is true (the 
    //locking texture was free). The second imageAtomicExchange call resets 
    //the lock back to 0. 

    bool have_written = false; 
    while (!have_written) { 
     bool can_write = (imageAtomicExchange(img2D_0,coord,1u) != 1u); 

     memoryBarrier(); 

     vec4 depths = imageLoad(img2D_1,coord); 
     depths = insert(depths,gl_FragCoord.z); 

     if (can_write) { 
      imageStore(img2D_1,coord,depths); 
      have_written = true; 
     } 

     memoryBarrier(); 

     imageAtomicExchange(img2D_0,coord,0); 

     memoryBarrier(); 
    } 
    discard; //Already wrote to render target with imageStore 
} 

我的問題是,爲什麼會出現這種斑點的行爲呢?我想讓螺旋鎖在100%的時間內工作!它可能與我的memoryBrier()的位置有關嗎?

回答

2

「imageAtomicExchange(img2D_0,coord,0);」需要在if語句中,因爲即使對於沒有它的線程,它也會重置鎖變量!改變它可以修復它。

+0

最終片段着色器是什麼樣的?它是否還有memoryBarrier()操作? – ragnar 2013-02-12 18:55:05

+0

是的,但在更簡潔的位置。 IIRC(它是程序生成的),它們僅在imageAtomicExchange和imageAtomicExchange之後。 – imallett 2013-02-12 21:35:06

+0

我實際上錯誤地認爲它解決了這個問題。我在這裏做了一個更完整的列表:http://stackoverflow.com/questions/21538555/broken-glsl-spinlock-glsl-locks-compendium – imallett 2014-02-03 21:53:29

3

作爲參考,這裏是鎖定的代碼,已經測試在GTX670上的Nvidia驅動程序314.22 & 320.18上工作。請注意,如果將代碼重新排序或重寫爲邏輯上等效的代碼,則會觸發現有的編譯器優化錯誤(請參閱下面的註釋)。下面的註釋使用無圖像引用。

// sem is initialized to zero 
coherent uniform layout(size1x32) uimage2D sem; 

void main(void) 
{ 
    ivec2 coord = ivec2(gl_FragCoord.xy); 

    bool done = false; 
    uint locked = 0; 
    while(!done) 
    { 
    // locked = imageAtomicCompSwap(sem, coord, 0u, 1u); will NOT work 
     locked = imageAtomicExchange(sem, coord, 1u); 
     if (locked == 0) 
     { 
      performYourCriticalSection(); 

      memoryBarrier(); 

      imageAtomicExchange(sem, coord, 0u); 

      // replacing this with a break will NOT work 
      done = true; 
     } 
    } 

    discard; 
}