爲了節省全局內存傳輸,並且由於代碼的所有步驟都單獨工作,我試圖將所有的kernals合併成一個內核,其中前2個(3個)正在執行的步驟爲設備調用而不是全局調用。 這是在第一步的後半部分失敗。第二次迭代崩潰 - 不相關的訂單
有一個功能,我需要調用兩次,來計算圖像的兩半。無論計算圖像的順序如何,它都會在第二次迭代時崩潰。
在仔細檢查代碼並用不同的返回點運行多次後,我發現是什麼讓它崩潰。
__device__
void IntersectCone(float* ModDistance,
float* ModIntensity,
float3 ray,
int threadID,
modParam param)
{
bool ignore = false;
float3 normal = make_float3(0.0f,0.0f,0.0f);
float3 result = make_float3(0.0f,0.0f,0.0f);
float normDist = 0.0f;
float intensity = 0.0f;
float check = abs(Dot(param.position, Cross(param.direction,ray)));
if(check > param.r1 && check > param.r2)
ignore = true;
float tran = param.length/(param.r2/param.r1 - 1);
float length = tran + param.length;
float Lsq = length * length;
float cosSqr = Lsq/(Lsq + param.r2 * param.r2);
//Changes the centre position?
float3 position = param.position - tran * param.direction;
float aDd = Dot(param.direction, ray);
float3 e = position * -1.0f;
float aDe = Dot(param.direction, e);
float dDe = Dot(ray, e);
float eDe = Dot(e, e);
float c2 = aDd * aDd - cosSqr;
float c1 = aDd * aDe - cosSqr * dDe;
float c0 = aDe * aDe - cosSqr * eDe;
float discr = c1 * c1 - c0 * c2;
if(discr <= 0.0f)
ignore = true;
if(!ignore)
{
float root = sqrt(discr);
float sign;
if(c1 > 0.0f)
sign = 1.0f;
else
sign = -1.0f;
//Try opposite sign....?
float3 result = (-c1 + sign * root) * ray/c2;
e = result - position;
float dot = Dot(e, param.direction);
float3 s1 = Cross(e, param.direction);
float3 normal = Cross(e, s1);
if((dot > tran) || (dot < length))
{
if(Dot(normal,ray) <= 0)
{
normal = Norm(normal); //This stuff (1)
normDist = Magnitude(result);
intensity = -IntensAt1m * Dot(ray, normal)/(normDist * normDist);
}
}
}
ModDistance[threadID] = normDist; and this stuff (2)
ModIntensity[threadID] = intensity;
}
有兩件事情我可以做,使這個不出事,都關閉其否定功能點:如果我不嘗試寫入ModDistance []和ModIntensity [],或者如果我不要寫正常的程度和強度。
上述代碼拋出一次機會異常,但是如果任何一個塊被註釋掉,則不會發生。 此外,該程序僅在第二次調用該例程時崩潰。
一直試圖弄清楚這一切,任何幫助將是太棒了。
調用它的代碼是:
int subrow = threadIdx.y + Mod_Height/2;
int threadID = subrow * (Mod_Width+1) + threadIdx.x;
int obsY = windowY + subrow;
float3 ray = CalculateRay(obsX,obsY);
if(!IntersectSphere(ModDistance, ModIntensity, ray, threadID, param))
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
subrow = threadIdx.y;
threadID = subrow * (Mod_Width+1) + threadIdx.x;
obsY = windowY + subrow;
ray = CalculateRay(obsX,obsY);
if(!IntersectSphere(ModDistance, ModIntensity, ray, threadID, param))
{
IntersectCone(ModDistance, ModIntensity, ray, threadID, param);
}
錯誤的症狀顯示在與錯誤來源不同的地方。整個內核太大,所以不能分配足夠的寄存器。 – 3Pi 2012-03-01 02:43:04
您呼出的兩條線可能提供了一個起點。 (1)'Norm()'函數是什麼樣的?我期望它返回一個標量,而不是一個向量? 'Magnitude()'同樣的問題。 (2)如果'threadID'超出範圍,我只能看到這行崩潰。 – 2012-03-01 03:38:35
剛剛看到您的評論。很好的發現。 – 2012-03-01 03:39:24