1
下面的內核計算聲壓場,每個線程計算它自己的pressure
向量的私有實例,然後需要將其總結爲全局內存。 我很確定計算pressure
向量的代碼是正確的,但我仍然無法使這產生預期的結果。OpenCL從私有到局部還是全局?
int gid = get_global_id(0);
int lid = get_local_id(0);
int nGroups = get_num_groups(0);
int groupSize = get_local_size(0);
int groupID = get_group_id(0);
/* Each workitem gets private storage for the pressure field.
* The private instances are then summed into local storage at the end.*/
private float2 pressure[HYD_DIM_TOTAL];
local float2 pressure_local[HYD_DIM_TOTAL];
/* Code which computes value of 'pressure' */
//wait for all workgroups to finish accessing any memory
barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
/// sum all results in a workgroup into local buffer:
for(i=0; i<groupSize; i++){
//each thread sums its own private instance into the local buffer
if (i == lid){
for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){
pressure_local[iHyd] += pressure[iHyd];
}
}
//make sure all threads in workgroup get updated values of the local buffer
barrier(CLK_LOCAL_MEM_FENCE);
}
/// copy all the results into global storage
//1st thread in each workgroup writes the group's local buffer to global memory
if(lid == 0){
for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){
pressure_global[groupID +nGroups*iHyd] = pressure_local[iHyd];
}
}
barrier(CLK_GLOBAL_MEM_FENCE);
/// sum the various instances in global memory into a single one
// 1st thread sums global instances
if(gid == 0){
for(iGroup=1; iGroup<nGroups; iGroup++){
//we only need to sum the results from the 1st group onward
for(iHyd=0; iHyd<HYD_DIM_TOTAL; iHyd++){
pressure_global[iHyd] += pressure_global[iGroup*HYD_DIM_TOTAL +iHyd];
barrier(CLK_GLOBAL_MEM_FENCE);
}
}
}
上的數據的尺寸的一些注意事項: 線程的總數將100和2000之間可能有時謊言此區間之外變化,但是。
groupSize
將取決於硬件,但我目前使用1(cpu)和32(gpu)之間的值。
HYD_DIM_TOTAL
在編譯時已知並且在4和32之間變化(通常,但不一定是2的冪)。
這個還原代碼有什麼明顯的錯誤嗎?
PS:我在AMD APP SDK 2.8和NVIDIA GTX580上運行i7 3930k。
感謝您的回答,並對我遲到的回覆感到抱歉:我實際上曾經將最後一個循環放在單獨的內核中。我讓代碼坐了一會兒,忘記了爲什麼我這樣做 - 感謝提醒我:p我將它再次拆分爲一個單獨的內核,再看看第一個循環 – 2013-02-24 16:19:33
嘿,我有同樣的問題,並想知道你是如何設法使其工作。 – Eric 2015-03-13 04:10:23