你的情況正確的轉換不是GHz的:
fprintf(stdout, "%d:%ld=%f(ms)\n", i,runtime[i], (runtime[i]/1.62)*1000.0);
^^^^
但赫茲:
fprintf(stdout, "%d:%ld=%f(ms)\n", i,runtime[i], (runtime[i]/1620000000.0f)*1000.0);
^^^^^^^^^^^^^
在維分析:
clock cycles
clock cycles/-------------- = seconds
second
的第一項是時鐘週期測量。第二項是GPU的頻率(赫茲,而不是GHz),第三項是期望的測量(秒)。您可以通過1000
乘以秒轉換成毫秒,這裏有一個工作的例子,顯示了一個與設備無關的方式做到這一點(這樣你就不必硬編碼時鐘頻率):
$ cat t1306.cu
#include <stdio.h>
const long long delay_time = 1000000000;
const int nthr = 1;
const int nTPB = 256;
__global__ void kernel(long long *clocks){
int idx=threadIdx.x+blockDim.x*blockIdx.x;
long long start=clock64();
while (clock64() < start+delay_time);
if (idx < nthr) clocks[idx] = clock64()-start;
}
int main(){
int peak_clk = 1;
int device = 0;
long long *clock_data;
long long *host_data;
host_data = (long long *)malloc(nthr*sizeof(long long));
cudaError_t err = cudaDeviceGetAttribute(&peak_clk, cudaDevAttrClockRate, device);
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
err = cudaMalloc(&clock_data, nthr*sizeof(long long));
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
kernel<<<(nthr+nTPB-1)/nTPB, nTPB>>>(clock_data);
err = cudaMemcpy(host_data, clock_data, nthr*sizeof(long long), cudaMemcpyDeviceToHost);
if (err != cudaSuccess) {printf("cuda err: %d at line %d\n", (int)err, __LINE__); return 1;}
printf("delay clock cycles: %ld, measured clock cycles: %ld, peak clock rate: %dkHz, elapsed time: %fms\n", delay_time, host_data[0], peak_clk, host_data[0]/(float)peak_clk);
return 0;
}
$ nvcc -arch=sm_35 -o t1306 t1306.cu
$ ./t1306
delay clock cycles: 1000000000, measured clock cycles: 1000000210, peak clock rate: 732000kHz, elapsed time: 1366.120483ms
$
這使用cudaDeviceGetAttribute
來獲得時鐘速率,它返回的結果爲kHz,這使得我們可以在這種情況下輕鬆計算毫秒。
除以赫茲的數量,而不是GHz。除以1620000000.0f'。時鐘週期除以時鐘週期每秒給你的秒數。將秒數乘以1000得到毫秒數。 –
@RobertCrovella,現在按預期工作,謝謝!如果您以此作爲答案,我很樂意將其標記爲已接受。 – John