函數成員作爲CUDA內核的參數

我正在使用動態並行機制，我想創建一個模板內核，給出一個對象指針+成員函數指針執行函數。這是一個最小（未）工作實例中，具有-arch = compute_35 -dlink標誌編譯，函數成員作爲CUDA內核的參數

#include <iostream> 

struct A 
{ 
    int i; 
    __device__ void clear() 
    { 
     i = 0; 
    } 
}; 

template<typename Object, typename memberFunction> 
__global__ void generalKernel(Object* o, memberFunction f) 
{ 
    (o->*f)(); 
} 

template<typename Object, typename memberFunction> 
__device__ void executeFunction(Object* o, memberFunction f) 
{ 
    generalKernel<<<1,1>>>(o,f); 
    cudaDeviceSynchronize(); 
} 

__global__ void mainKernel(A* a) 
{ 
    executeFunction(a, &A::clear); 
} 

int main(int argc, char * argv[]) 
{ 
    A* a; 
    cudaMallocManaged(&a, sizeof(A)); 
    a->i = 1; 

    mainKernel<<<1,1>>>(a); 
    cudaDeviceSynchronize(); 

    std::cout << a->i << std::endl; 

    return EXIT_SUCCESS; 
}

來源

2017-11-10 Fabio T.

請提供一個簡短的完整示例，其他人可以嘗試編譯並查看該問題。還要確定你的編譯命令和編譯器的確切輸出 –

用一個完整的例子更新:)。該錯誤是相當長的提供 –

下面是一個簡單CUDA代碼，以顯示如何通過成員函數指針到內核。一切都在代碼中解釋。

#define gpuErrchk(val) \ 
    cudaErrorCheck(val, __FILE__, __LINE__, true) 
void cudaErrorCheck(cudaError_t err, char* file, int line, bool abort) 
{ 
    if(err != cudaSuccess) 
    { 
     printf("%s %s %d\n", cudaGetErrorString(err), file, line); 
     if(abort) exit(-1); 
    } 
} 


// struct holds an 'int' type data memeber and '__device__' function member 
struct ST 
{ 
    int id; 
    __device__ void foo() 
    { 
     printf("value of id: %d\n",id); 
    } 
}; 

// creating an alias for our function pointer 
// since the function is a member of a struct, we add struct name and scope resolution 'ST::' 
// to signify as such 
typedef void (ST::*Fptr)(void); 

// templated kernel 
template<typename Object, typename memberFunction> 
__global__ void kernel(Object* o, memberFunction f) 
{ 
    (o->*f)(); 
} 

// declaring a __device__ function pointer, assigning it the address of 'ST::foo' 
// remember that this function pointer is also direclty accessible from the kernel 
__device__ Fptr fp = &ST::foo; 


int main(int argc, char** argv) 
{ 
    // declaring and initializing a host 'ST' object 
    ST h_st; 
    h_st.id = 10; 


    // device 'ST' object 
    ST* d_st; 
    // allocating device memory 
    gpuErrchk(cudaMalloc((void**)&d_st, sizeof(ST))); 
    // copying host data from host object to device object 
    gpuErrchk(cudaMemcpy(d_st, &h_st, sizeof(ST), cudaMemcpyHostToDevice)); 

    // declaring host side function pointer of type 'Fptr', which can be passed to kernel as argument 
    Fptr h_fptr; 
    // copying address of '__device__' function pointer to host side function pointer 
    gpuErrchk(cudaMemcpyFromSymbol(&h_fptr, fp, sizeof(Fptr))); 


    // passing arguments to kernel 
    kernel<<<1,1>>>(d_st,h_fptr); 

    // making sure no errors occured 
    gpuErrchk(cudaPeekAtLastError()); 
    gpuErrchk(cudaDeviceSynchronize()); 

    // free device memory 
    gpuErrchk(cudaFree(d_st)); 

    return 0; 
}

來源

2017-11-11 02:05:07 zindarod

我正在尋找一個適合所有的解決方案，這就是爲什麼我使用模板。 Typedef函數指針並且每次聲明它都是編碼最差的，而不是爲每個要調用的函數創建一個內核。 –

閱讀此如何最好地使用函數指針https://isocpp.org/wiki/faq/pointers-to-members – zindarod

我設法使用標識符別名進行編譯。現在這個問題與統一內存有關，就像在這種情況下一樣：https://stackoverflow.com/questions/42296854/assignment-of-function-pointer-with-the-unified-memory-in-cuda。我留下這個想法，並會找到另一個。 –

函數成員作爲CUDA內核的參數

回答

相關問題