Cuda鏈接錯誤

我有問題試圖建立基本的CUDA /推動代碼，以更熟悉GPU編程。我可能沒有正確編譯它，所以我想知道我在做什麼錯了？Cuda鏈接錯誤

我使用下面的說明

nvcc -c gpu_functions.cu 
nvcc gpu_functions.o gpu_test.cu -o gpu_test

不過，我得到一個鏈接錯誤建設：

[email protected]:~/dev/analytics/src$ nvcc gpu_functions.o gpu_test.cu -o gpu_test 
/tmp/tmpxft_00002383_00000000-14_gpu_test.o: In function `main': 
tmpxft_00002383_00000000-3_gpu_test.cudafe1.cpp:(.text+0x6e): undefined reference to `void add<thrust::device_vector<int, thrust::device_malloc_allocator<int> > >(thrust::device_vector<int, thrust::device_malloc_allocator<int> > const&, thrust::device_vector<int, thrust::device_malloc_allocator<int> > const&, thrust::device_vector<int, thrust::device_malloc_allocator<int> >&)' 
collect2: ld returned 1 exit status

我有三個文件：

gpu_functions.h（頭功能對於GPU功能）
gpu_functions.cu（the i mplementation的GPU功能）
gpu_test.cu（調用我定義的GPU功能的主循環）

gpu_functions.h

template<typename Vector> 
void add(const Vector& in1, const Vector& in2, Vector& out);

gpu_functions.cu

#include "gpu_functions.h" 
#include <thrust/sequence.h> 
#include <thrust/transform.h> 
#include <thrust/sequence.h> 
#include <thrust/copy.h> 
#include <thrust/fill.h> 
#include <thrust/replace.h> 
#include <thrust/functional.h> 

using namespace thrust; 

template<typename Vector> 
void add(const Vector& in1, const Vector& in2, Vector& out) { 
transform(in1.begin(), in1.end(), in2.begin(), out.begin(), 
      plus<typename Vector::value_type>()); 
}

gpu_test .cu

#include "piston_functions.h" 
#include <thrust/device_vector.h> 
#include <iostream> 
#include <stdio.h> 

using namespace thrust; 

int main(void) { 
    const int n = 100000000; 
    // allocate three device_vectors with 10 elements 
    device_vector<int> in1(n, 1); 
    device_vector<int> in2(n, 2); 
    device_vector<int> out(n, 0); 

    add(in1, in2, out); 

    thrust::copy(out.begin(), out.begin()+10, std::ostream_iterator<int>(std::cout,"\n")); 

    return 0;  
}

我可能在做一些愚蠢的事情，或者我錯過了非常明顯的事情。

來源

2014-01-29 jimjampez

使用模板時，所有顯式專業化聲明必須在模板實例化時可見。在你的情況下，'add'在'gpu_functions.cu'中定義，但沒有實例化，'gpu_test.cu'中沒有任何東西可見。嘗試將'add'的定義從'gpu_functions.cu'移動到'gpu_test.cu'。 – JackOLantern

@harrism我已經發布了答案。 – JackOLantern

一旦聲明，模板函數需要顯式或隱式實例化，即爲模板參數的特定組合生成一個具體函數（實例）。

在gpu_functions.cu編譯單元中，您缺少兩者。換句話說，編譯器不會生成函數add的實例，以便鏈接程序找不到要鏈接的任何內容。

您應該通過將模板化函數聲明包含在隱式實例化它的位置（即包含main函數的編譯單元）來解決此問題。

換句話說，下面的代碼將編譯正確

#include <thrust/device_vector.h> 
#include <iostream> 
#include <stdio.h> 

using namespace thrust; 

template<typename Vector> 
void add(const Vector& in1, const Vector& in2, Vector& out) { 
transform(in1.begin(), in1.end(), in2.begin(), out.begin(), 
    plus<typename Vector::value_type>()); 
} 

int main(void) { 
    const int n = 100000000; 
    device_vector<int> in1(n, 1); 
    device_vector<int> in2(n, 2); 
    device_vector<int> out(n, 0); 

    add(in1, in2, out); 

    thrust::copy(out.begin(), out.begin()+10, std::ostream_iterator<int>(std::cout,"\n")); 

    return 0;  
}

當然，你可以在一個單獨的文件.cuh移動模板函數聲明，並通過#include指令，包括它。

最後，一定要記得加CUDA error checking。

來源

2014-01-31 08:41:56 JackOLantern

回答

相關問題