2013-10-18 56 views
0

好的,我已經從VS2012中的Python切換到C++,以努力讓該項目再次滾動。我遇到了很多障礙和頭痛,以瞭解這個問題的來龍去脈。這是我最新的,最令人沮喪的和與之相伴的編譯錯誤。CUDA動態並行性錯誤:LNK2001

1> C:\Users\Karsten Chu\New Google Drive\Research\Visual Studio 2012\Projects\Dynamic Parallelism Test\Dynamic Parallelism Test>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -dlink -o "x64\Debug\Dynamic Parallelism Test.device-link.obj" -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64" cuda.lib cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib -lcudadevrt -gencode=arch=compute_35,code=sm_35 -G --machine 64 "x64\Debug\CUDA Test 2.cu.obj" "x64\Debug\CUDA Test.cu.obj" 
1>Dynamic Parallelism Test.device-link.obj : error LNK2001: unresolved external symbol __fatbinwrap_54_tmpxft_00000634_00000000_8_cuda_device_runtime_cpp1_ii_5f6993ef 
1>C:\Users\Karsten Chu\New Google Drive\Research\Visual Studio 2012\Projects\Dynamic Parallelism Test\x64\Debug\Dynamic Parallelism Test.exe : fatal error LNK1120: 1 unresolved externals 
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== 

和我的代碼。

#include <cuda.h> 
#include <cuda_runtime.h> 
#include <device_launch_parameters.h> 
#include <stdio.h> 
#include <iostream> 
using namespace std; 

__global__ void rkf5(double*, int*); 
__global__ void k1(double*); 

int main2(int argc, char** argv) 
{ 
    const int max_length = 5; 
    double concs[max_length]; 
    for (int i=0; i<max_length; i++) 
    { 
     concs[i]=0; 
     //std::cout<<concs[i]; 
    } 

    double *d_concs; 
    int *d_max_length; 
    size_t size_concs = sizeof(concs); 
    size_t size_max_length = sizeof(max_length); 
    cudaMalloc((void**)&d_concs, size_concs); 
    cudaMemcpy(d_concs, concs, size_concs, cudaMemcpyHostToDevice); 
    cudaMalloc((void**)&d_max_length, size_max_length); 
    cudaMemcpy(d_concs, concs, size_concs, cudaMemcpyHostToDevice); 
    rkf5<<<1,max_length>>>(d_concs, d_max_length); 
    cudaMemcpy(concs, d_concs, size_concs, cudaMemcpyDeviceToHost); 

    for (int i=0; i<max_length; i++) 
    { 
     std::cout<<concs[i]; 
    } 
    return 0; 
} 

__global__ void rkf5(double* concs, int* max_length) 
{ 
    int idx = blockIdx.x * blockDim.x + threadIdx.x; 
    concs[idx]=idx; 
    //dim3 threads = dim3(max_length); 
    k1<<< 1, *max_length >>>(concs); 
} 
__global__ void k1(double* concs) 
{ 
    int idx = blockIdx.x * blockDim.x + threadIdx.x; 
    concs[idx]=0; 
} 

請在這裏幫我,我花了這麼多時間Google搜索這個問題,我發現每一個領導都沒有發佈解決方案。

回答

2

您的編譯和鏈接命令行:

nvcc.exe -dlink -o "x64\Debug\Dynamic Parallelism Test.device-link.obj" 
-Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " 
-L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64" 
cuda.lib cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib 
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib 
odbc32.lib odbccp32.lib -lcudadevrt -gencode=arch=compute_35,code=sm_35 -G 
--machine 64 "x64\Debug\CUDA Test 2.cu.obj" "x64\Debug\CUDA Test.cu.obj" 

您正在嘗試對Linux的cudadevrt風格(-lcudadevrt)鏈接。這不會在Windows上工作,被調用的鏈接器是VS形式。將cudadevrt.lib添加到您的鏈接器輸入中,如cudart.lib

+0

優秀,感謝你抓住。我會試着將所有這些試驗/錯誤經驗彙編成一篇文章,以便跟蹤我的人不會像我一樣陷入同樣的​​陷阱。 –

2

我已經成功地編譯和運行代碼:

  1. 使用Using CUDA dynamic parallelism in Visual Studio 2010的程序;
  2. 改變main2改爲main;

方案產出0123 :-)

+0

哈哈,是的,赦免主要和main2的愚蠢,我對C++和VS2012很新穎,所以我會做一些小小的外行修復,讓錯誤信息消失。好奇,這意味着我的代碼沒有到達子內核,否則它應該輸出0。感謝您花時間! –