OpenCL奇怪的內核行爲

我還是OpenCL的新手，我用Nvidia的例子做了一些測試，整個程序由5個內核組成，這些內核按順序執行（1,2,3,4,5）。OpenCL奇怪的內核行爲

第一個內核需要簡單的位置數據，速度數據，並應用重力和基本的碰撞檢測，然後調整該位置和速度...這個內核完美的作品沒有任何問題。

這裏是第一內核：

__kernel void integrate(
__global float4 *d_Pos, //input/output 
__global float4 *d_Vel, //input/output 
__constant simParams_t *params, 
float deltaTime, 
uint numParticles 
){ 
const uint index = get_global_id(0); 
if(index >= numParticles) 
    return; 

float4 pos = d_Pos[index]; 
float4 vel = d_Vel[index]; 

pos.w = 1.0f; 
vel.w = 0.0f; 

//Gravity 
vel += (float4)(params->gravity.x, params->gravity.y, params->gravity.z, 0) * deltaTime; 
vel *= params->globalDamping; 

//Advance pos 
pos += vel * deltaTime; 


//Collide with cube 
if(pos.x < -1.0f + params->particleRadius){ 
    pos.x = -1.0f + params->particleRadius; 
    vel.x *= params->boundaryDamping; 
} 
if(pos.x > 1.0f - params->particleRadius){ 
    pos.x = 1.0f - params->particleRadius; 
    vel.x *= params->boundaryDamping; 
} 

if(pos.y < -1.0f + params->particleRadius){ 
    pos.y = -1.0f + params->particleRadius; 
    vel.y *= params->boundaryDamping; 
} 
if(pos.y > 1.0f - params->particleRadius){ 
    pos.y = 1.0f - params->particleRadius; 
    vel.y *= params->boundaryDamping; 
} 

if(pos.z < -1.0f + params->particleRadius){ 
    pos.z = -1.0f + params->particleRadius; 
    vel.z *= params->boundaryDamping; 
} 
if(pos.z > 1.0f - params->particleRadius){ 
    pos.z = 1.0f - params->particleRadius; 
    vel.z *= params->boundaryDamping; 
} 

//Store new position and velocity 
d_Pos[index] = pos; 
d_Vel[index] = vel; 
}

第二內核正在這些位置作爲輸入，並輸出另一種數據的（某些指標），但它不改變位置數據。

第三內核正在做調整第二內核輸出（從其中不接觸位置數據的第二內核負責數據）。

現在的問題...第四內核;這需要位置數據和速度數據（從第一內核），需要從第三內核的調整數據，輸出另一個的位置和速度的數據（完全不同的指針爲這些位置和速度）

這裏是第四內核：

__kernel void findCellBoundsAndReorder(
__global uint *d_CellStart,  //output: cell start index 
__global uint *d_CellEnd,  //output: cell end index 
__global float4 *d_ReorderedPos, //output: reordered by cell hash positions 
__global float4 *d_ReorderedVel, //output: reordered by cell hash velocities 

__global const uint *d_Hash, //input: sorted grid hashes 
__global const uint *d_Index, //input: particle indices sorted by hash 
__global const float4 *d_Pos,  //input: positions array sorted by hash 
__global const float4 *d_Vel,  //input: velocity array sorted by hash 
__local uint *localHash,   //get_group_size(0) + 1 elements 
uint numParticles 
){ 
uint hash; 
const uint index = get_global_id(0); 

//Handle case when no. of particles not multiple of block size 
if(index < numParticles){ 
    hash = d_Hash[index]; 

    //Load hash data into local memory so that we can look 
    //at neighboring particle's hash value without loading 
    //two hash values per thread 
    localHash[get_local_id(0) + 1] = hash; 

    //First thread in block must load neighbor particle hash 
    if(index > 0 && get_local_id(0) == 0) 
     localHash[0] = d_Hash[index - 1]; 
} 

barrier(CLK_LOCAL_MEM_FENCE); 

if(index < numParticles){ 
    //Border case 
    if(index == 0) 
     d_CellStart[hash] = 0; 

    //Main case 
    else{ 
     if(hash != localHash[get_local_id(0)]) 
      d_CellEnd[localHash[get_local_id(0)]] = d_CellStart[hash] = index; 
    }; 

    //Another border case 
    if(index == numParticles - 1) 
     d_CellEnd[hash] = numParticles; 


    //Now use the sorted index to reorder the pos and vel arrays 
    uint sortedIndex = d_Index[index]; 
    float4 pos = d_Pos[sortedIndex]; 
    float4 vel = d_Vel[sortedIndex]; 

    d_ReorderedPos[index] = pos; 
    d_ReorderedVel[index] = vel; 
} 
}

問題是，如果我單獨執行內核1（或1 + 2或1 + 2 + 3）位置並且速度從第一個內核正確調整。

但如果我執行內核1 + 2 + 3 + 4（雖然內核4不改變輸入數據），該數據保持不變（因爲如果我沒有執行什麼...職位不作調整）。

來源

2012-12-25 Mohamed Sakr Abo Youcuf

您使用的是亂序指令隊列之後？ – matthias

我不知道你在說什麼，因爲我還是新的：D 但我正在使用一個比普通的opencl調用更簡單的庫（libstdcl）......但我單獨調用每個內核（所以內核1完成執行，然後啓動內核2），因爲當我在隊列中做他們我得到錯誤（-5），我猜沒有足夠的內存問題 –

請添加調用這些內核的相關代碼（和您創建隊列的地方） –

好，我想通了這個問題..我在第3個內核本地組大小做錯誤，固定，每一件事物去正確的，對不起，這

來源

2012-12-30 19:26:41

OpenCL奇怪的內核行爲

回答

相關問題