cuPrintf不做任何事（程序使用固定+映射內存，CUBLAS也）

我需要從CUDA內核打印幾個值，並嘗試使用cuPrintf。我的計算能力是1.1，所以我不能使用printf。程序編譯正確，並且不會給出任何運行時錯誤。但是，cuPrintf行似乎什麼也沒做。下面是一些我嘗試過的事情：cuPrintf不做任何事（程序使用固定+映射內存，CUBLAS也）

編譯-arch sm_11
環繞與cudaPrintfInit和cudaPrintfEnd各個內核調用
確保字符數是足夠小，使用默認的緩衝區大小工作
確保cudaPrintfInit和cudaPrintfDisplay返回cudaSuccess

我的程序使用，除了常規的東西以下內容：

CUBLAS庫
頁面鎖定（固定）+映射內存

爲什麼不調用cuPrintf做什麼？

編輯
下面是從代碼的一些相關片段：

__global__ void swap_rows(float *d_A, int r1, int r2, int n) 
{ 
    int i = r1; 
    int j = blockDim.x*blockIdx.x + threadIdx.x; 
    cuPrintf("(%d,%d) ", i, j); 

    if(j >= n) return; 
    float tmp; 
    tmp = d_A[L(i,j)]; 
    d_A[L(i,j)] = d_A[L(r2,j)]; 
    d_A[L(r2,j)] = tmp; 
} 

extern "C" float *someFunction(float *_A, float *_b, int n) 
{ 
    int i, i_max, k, n2 = n*n; 
    dim3 lblock_size(32,1); 
    dim3 lgrid_size(n/lblock_size.x + 1, 1); 
    float *d_A, *d_b, *d_x, *h_A, *h_b, *h_x, tmp, dotpdt; 

    cublasStatus status; 
    cudaError_t ret; 

    if((ret = cudaSetDeviceFlags(cudaDeviceMapHost)) != cudaSuccess) { 
    fprintf(stderr, "Error setting device flag: %s\n", 
      cudaGetErrorString(ret)); 
    return NULL; 
    } 

    // Allocate mem for A and copy data 
    if((ret = cudaHostAlloc((void **)&h_A, n2 * sizeof(float), 
          cudaHostAllocMapped)) != cudaSuccess) { 
    fprintf(stderr, "Error allocating page-locked h_A: %s\n", 
      cudaGetErrorString(ret)); 
    return NULL; 
    } 

    if((ret = cudaHostGetDevicePointer((void **)&d_A, h_A, 0)) != cudaSuccess) { 
    fprintf(stderr, "Error getting devptr for page-locked h_A: %s\n", 
      cudaGetErrorString(ret)); 
    return NULL; 
    } 

    if((ret = cudaMemcpy(h_A, _A, n2 * sizeof(float), cudaMemcpyHostToHost)) != 
     cudaSuccess) { 
    fprintf(stderr, "Error copying A into h_A: %s\n", cudaGetErrorString(ret)); 
    return NULL; 
    } 

    // Some code to compute k and i_max 

    if(cudaPrintfInit() != cudaSuccess) 
    printf("cudaPrintfInit failed\n"); 

    swap_rows<<<lgrid_size,lblock_size>>>(d_A, k, i_max, n); 
    if((ret = cudaThreadSynchronize()) != cudaSuccess) 
    fprintf(stderr, "Synchronize failed!\n", cudaGetErrorString(ret)); 

    if(cudaPrintfDisplay(stdout, true) != cudaSuccess) 
    printf("cudaPrintfDisplay failed\n"); 
    cudaPrintfEnd(); 

// Some more code 
}

我忘了提及：這些方法單獨編譯（從main（）函數），爲動態鏈接的模塊（共享目的）。

來源

2013-03-17 fhl

也許你應該發佈一些代碼。如果在內核調用後執行cudaDeviceSynchronize（）調用並對其執行cuda錯誤檢查，會發生什麼情況？ – 2013-03-17 19:52:35

@RobertCrovella - 我已經編輯了這個問題來包含一些代碼。由於我使用的是舊版本的API，因此我使用cudaThreadSynchronize（）而不是cudaDeviceSynchronize（），並且它不返回任何錯誤。 – fhl 2013-03-18 02:24:13

您使用的是哪個版本？所需要的是檢查內核調用錯誤，使用一種像[這裏]描述的方法（http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-使用cudaPeekAtLastError（） – 2013-03-18 02:47:03

想通了：我有另一個給出「無效配置參數」錯誤的內核。我爲該內核使用了32 * 32 * 1的塊大小，並且這超過了每塊允許的最大線程數。只要這個問題得到解決，cuPrintf就開始工作了。

來源

2013-03-18 05:30:35 fhl

cuPrintf不做任何事（程序使用固定+映射內存，CUBLAS也）

回答

相關問題