nvprof沒有拿起任何API調用或內核

我想用nvprof在我的CUDA程序中獲得一些基準時間，但不幸的是它似乎沒有分析任何API調用或內核。我找了一個簡單的入門例子，以確保我做的是正確的，發現一個就在這裏了Nvidia開發的博客：nvprof沒有拿起任何API調用或內核

https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/

代碼：

int main() 
{ 
    const unsigned int N = 1048576; 
    const unsigned int bytes = N * sizeof(int); 
    int *h_a = (int*)malloc(bytes); 
    int *d_a; 
    cudaMalloc((int**)&d_a, bytes); 

    memset(h_a, 0, bytes); 
    cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice); 
    cudaMemcpy(h_a, d_a, bytes, cudaMemcpyDeviceToHost); 

    return 0; 
}

命令行：

-bash-4.2$ nvcc profile.cu -o profile_test 
-bash-4.2$ nvprof ./profile_test

所以我逐字逐句複製它，並且運行相同的命令行參數。不幸的是我的結果是一樣的：

-bash-4.2$ nvprof ./profile_test 
==85454== NVPROF is profiling process 85454, command: ./profile_test 
==85454== Profiling application: ./profile_test 
==85454== Profiling result: 
No kernels were profiled. 

==85454== API calls: 
No API activities were profiled.

我運行Nvidia的工具包7.5

如果有誰知道什麼什麼，我做錯了我會很感激知道答案。

----- 編輯 -----

所以我修改了代碼爲

#include<cuda_profiler_api.h> 

int main() 
{ 
    cudaProfilerStart(); 
    const unsigned int N = 1048576; 
    const unsigned int bytes = N * sizeof(int); 
    int *h_a = (int*)malloc(bytes); 
    int *d_a; 
    cudaMalloc((int**)&d_a, bytes); 

    memset(h_a, 0, bytes); 
    cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice); 
    cudaMemcpy(h_a, d_a, bytes, cudaMemcpyDeviceToHost); 

    cudaProfilerStop(); 
    return 0; 
}

遺憾的是它並沒有改變的東西。

來源

2016-05-01 theKunz

出於藍色，你試圖配置的內核是什麼？ –

@FlorentDUGUET其實現了一個壓縮行稀疏矩陣壓縮算法。試圖獲得一些衡量其性能的指標。 – theKunz

你應該檢查所有API調用的返回值，這很可能是你有一個你沒有捕獲的錯誤。你也可以通過'cuda-memcheck'運行它，它會報告API調用的錯誤，但最好的做法是總是檢查* any * API的返回值。 – Tom

您需要在退出線程之前調用cudaProfilerStop()（用於運行時API）。這允許nvprof收集所有必要的數據。

根據CUDA doc：

爲了避免丟失尚未被刷新個人資料信息時，應用程序被分析應該確保，在退出之前，所有 GPU工作完成後（使用CUDA sychronization電話），然後致電 cudaProfilerStop()或cuProfilerStop()。這樣做會強制緩衝關於相應上下文的配置文件信息以進行刷新。

來源

2016-05-01 19:07:20

或者，在顯式上下文銷燬期間，在出口調用'cudaDeviceReset'將觸發配置文件緩衝區刷新。 – talonmies

嘗試了你的建議，遺憾的是它仍然沒有分析。（請參閱編輯的代碼） – theKunz

可能是因爲編譯器優化了代碼或者API調用存在一些問題（即檢查錯誤代碼）。您也可以嘗試將至少一個內核配置文件。 –

nvprof沒有拿起任何API調用或內核

回答

相關問題