OpenCL從私有到局部還是全局？

下面的內核計算聲壓場，每個線程計算它自己的pressure向量的私有實例，然後需要將其總結爲全局內存。我很確定計算pressure向量的代碼是正確的，但我仍然無法使這產生預期的結果。OpenCL從私有到局部還是全局？

int gid  = get_global_id(0); 
int lid  = get_local_id(0); 
int nGroups = get_num_groups(0); 
int groupSize = get_local_size(0); 
int groupID = get_group_id(0); 

/* Each workitem gets private storage for the pressure field. 
* The private instances are then summed into local storage at the end.*/ 
private float2 pressure[HYD_DIM_TOTAL]; 
local float2 pressure_local[HYD_DIM_TOTAL]; 

/* Code which computes value of 'pressure' */ 

//wait for all workgroups to finish accessing any memory 
barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE); 

/// sum all results in a workgroup into local buffer: 
for(i=0; i<groupSize; i++){ 

    //each thread sums its own private instance into the local buffer 
    if (i == lid){ 
     for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){ 
      pressure_local[iHyd] += pressure[iHyd]; 
     } 
    } 
    //make sure all threads in workgroup get updated values of the local buffer 
    barrier(CLK_LOCAL_MEM_FENCE); 
} 

/// copy all the results into global storage 
//1st thread in each workgroup writes the group's local buffer to global memory 
if(lid == 0){ 
    for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){ 
     pressure_global[groupID +nGroups*iHyd] = pressure_local[iHyd]; 
    } 
} 

barrier(CLK_GLOBAL_MEM_FENCE); 

/// sum the various instances in global memory into a single one 
// 1st thread sums global instances 
if(gid == 0){ 

    for(iGroup=1; iGroup<nGroups; iGroup++){ 

     //we only need to sum the results from the 1st group onward 
     for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){ 

      pressure_global[iHyd] += pressure_global[iGroup*HYD_DIM_TOTAL +iHyd]; 
      barrier(CLK_GLOBAL_MEM_FENCE); 
     } 
    } 
}

上的數據的尺寸的一些注意事項：線程的總數將100和2000之間可能有時謊言此區間之外變化，但是。
groupSize將取決於硬件，但我目前使用1（cpu）和32（gpu）之間的值。
HYD_DIM_TOTAL在編譯時已知並且在4和32之間變化（通常，但不一定是2的冪）。

這個還原代碼有什麼明顯的錯誤嗎？

PS：我在AMD APP SDK 2.8和NVIDIA GTX580上運行i7 3930k。

來源

2013-02-19 Emanuel Ey

我這裏要注意兩個問題，一個大，一個小：

此代碼表明你有一個什麼樣的障礙呢誤解。障礙從不在多個工作組之間同步。它只在工作組內同步。 CLK_GLOBAL_MEM_FENCE使它看起來像是全局同步，但它確實不是。該標誌只是將當前工作項目對全局內存的訪問進行隔離。如此傑出的寫作將會在具有此標誌的障礙之後在全球範圍內觀察到。但它不會改變屏障的同步行爲，這只是在工作組的範圍內。除了啓動另一個NDRange或任務之外，OpenCL中沒有全局同步。
第一個for循環會導致多個工作項目重寫彼此的計算。使用iHyd進行的pressure_local索引將由具有相同iHyd的每個工作項目完成。這會產生未定義的結果。

希望這會有所幫助。

來源

2013-02-19 18:46:40 boiler96

感謝您的回答，並對我遲到的回覆感到抱歉：我實際上曾經將最後一個循環放在單獨的內核中。我讓代碼坐了一會兒，忘記了爲什麼我這樣做 - 感謝提醒我：p我將它再次拆分爲一個單獨的內核，再看看第一個循環 – 2013-02-24 16:19:33

嘿，我有同樣的問題，並想知道你是如何設法使其工作。 – Eric 2015-03-13 04:10:23

OpenCL從私有到局部還是全局？

回答

相關問題