0
如何計算類似下面的函數執行的週期數。我應該直接計算總和和muls和div的數量嗎?我在哪裏可以檢查添加CUDA需要多少週期?計算CUDA內核中的週期數
__global__
void mandelbrotSet_per_element(Grayscale *image){
float minR = -2.0f, maxR = 1.0f;
float minI = -1.2f, maxI = minI + (maxR-minR) * c_rows/c_cols;
float realFactor = (maxR - minR)/(c_cols-1);
float imagFactor = (maxI - minI)/(c_rows-1);
bool isInSet;
float c_real, c_imag, z_real, z_imag;
int y = blockDim.y * blockIdx.y + threadIdx.y;
int x = blockDim.x * blockIdx.x + threadIdx.x;
while (y < c_rows){
while (x < c_cols) {
c_real = minR + x * realFactor;
c_imag = maxI - y * imagFactor;
z_real = c_real; z_imag = c_imag;
isInSet = true;
for (int k = 0; k < c_iterations; k++){
float z_real2 = z_real * z_real;
float z_imag2 = z_imag * z_imag;
if (z_real2 + z_imag2 > 4){
isInSet = false;
break;
}
z_imag = 2 * z_real * z_imag + c_imag;
z_real = z_real2 - z_imag2 + c_real;
}
if (isInSet) image[y*c_cols+x] = 255;
else image[y*c_cols+x] = 0;
x += blockDim.x * gridDim.x;
}
x = blockDim.x * blockIdx.x + threadIdx.x;
y += blockDim.y * gridDim.y;
}
}
非常感謝。因此,例如,如果內核添加了8個,則所需的週期數是8/32?根據指令吞吐量? – BRabbit27
如果內核在CC 2.0設備上運行的單個線程(即序列*)中執行8 SP FP添加*,並忽略其他因素(如ILP,數據停頓,註冊爭用,愚蠢的編譯器技巧等)應該需要8個時鐘,或者更準確地說,SM可以在每個時鐘下退出1次(這就是*吞吐量*)。如果warp中的所有線程正在執行相同的8次添加,它將不再使用(具有相同的注意事項)。如果warp中只有8個線程正在執行相同的8次添加,則不會花費更少的時間。 –