For循環中的設備功能的計算能力1.1設備

我寫了一個__device__功能，使用for循環。它適用於GTX640卡（計算能力2.1），但不適用於9500GT（計算能力1.1）。For循環中的設備功能的計算能力1.1設備

的功能大致是這樣的：

__device__ void myFuncD(float4 *myArray, float4 *result, uint index, uint foo, uint *here, uint *there) 
{ 
    uint j; 
    float4 myValue = myArray[index]; 
    uint idxHere = here[foo]; 
    uint idxThere = there[foo]; 
    float4 temp; 

    for(j=idxHere;j<idxThere;j++){ 
     temp = myArray[j]; 

     //do things with myValue and temp, write result to *result 
     result->x += /* some calculations with myValue.x and temp.x */ 
     result->y += /* some calculations with myValue.y and temp.y */ 
     result->z += /* some calculations with myValue.z and temp.z */ 
    } 
} 

__global__ void myKernelD(float4 *myArray, float4 *myResults, uint *here, uint *there) 
{ 
    uint index = blockDim.x*blockIdx.x+threadIdx.x; 

    float4 result = = make_float4(0.0f,0.0f,0.0f,0.0f); 
    uint foo1, foo2, foo3, foo4; 

    //compute foo1, foo2, foo3, foo4 based on myArray[index] 

    myFuncD(myArray, &result, index, foo1, here, there); 
    myFuncD(myArray, &result, index, foo2, here, there); 
    myFuncD(myArray, &result, index, foo3, here, there); 
    myFuncD(myArray, &result, index, foo4, here, there); 

    myResults[index] = result; 
}

在GTX460，myResults有正確的價值觀，但9500GT其成員的每一個部件全是零。

如何使用計算能力1.1設備實現相同的效果？

來源

2012-08-29 user1411287

你是什麼意思，特別是「它不適用於9500 GT」？我沒有看到關於SM 1.1上的非法代碼的任何具體內容。特別是，我沒有看到標題中提到的類遞歸行爲。 – harrism

所以現在你已經相當改變了這個問題，所有的遞歸提到都沒有了。但是你還沒有說過在1.1版本的設備上不起作用。請再次編輯您的問題以包含問題描述。 – talonmies

我的意思是'for'循環。對不起。我讀了關於SM 1.1不支持遞歸的另一個問題的討論，並且弄混了這個詞。另外，'__device__'函數是'void'函數，'result'函數是使用' - >'運算符來訪問的。在9500GT上，'myResults'的每個成員都是（0.0，0.0，0.0，0.0）。旁邊的問題：我是否正確地假設使用float4比使用float3更好，即使我不需要w組件？ – user1411287

用戶試圖使用太多線程每塊啓動，並得到錯誤「太多的資源請求啓動」。每塊減少線程允許內核啓動。

來源

2012-09-11 03:53:10 harrism

For循環中的設備功能的計算能力1.1設備

回答

相關問題