同一個CUDA源文件中的多個全局函數

-5

我可以在同一個CUDA源文件中編寫兩個單獨的全局函數來計算不同的東西嗎？像這樣：同一個CUDA源文件中的多個全局函數

__global__ void Ker1(mpz_t *d,mpz_t *c,mpz_t e,mpz_t n) 
{ 
    int i=blockIdx.x*blockDim.x + threadIdx.x; 

    mpz_powm (d[i], c[i], e, n); 

} 

__global__ void Ker2(mpz_t *d,mpz_t *c,mpz_t d, mpz_t n) 
{ 
    int i=blockIdx.x*blockDim.x + threadIdx.x; 
    mpz_powm(c[i], d[i],d, n); 

} 


int main() 
{ 
    /* ... */ 
    cudaMemcpy(decode_device,decode_buffer,memSize,cudaMemcpyHostToDevice); 
    Ker1<<<dimGrid , dimBlock >>>(d_device,c_device,e,n); 
    Ker2<<<dimGrid , dimBlock>>>(c_device,d_device,d,n); 
    cudaMemcpy(decode_buffer,decode_device,memSize,cudaMemcpyDeviceToHost); 
}

如果不是，你會怎麼做這樣的事情？

來源

2016-12-08 wolfgunner

不要發送垃圾郵件標籤。 – Olaf

試試吧...... – tera

我試着回答你關於不同內核寫在同一個源文件中的問題。但是，請考慮下次改進你的問題。 – Taro

這是很不清楚你問什麼，但在3次讀數後，我假設：「我可以在同一個源文件中寫入幾個內核？」。您可以在主函數中編寫儘可能多的內核啓動程序。

這裏9頁上的例子：

... 
cudaMemcpy(dev1, host1, size, H2D) ; 
kernel2 <<< grid, block, 0 >>> (..., dev2, ... ) ; 
kernel3 <<< grid, block, 0 >>> (..., dev3, ... ) ; 
cudaMemcpy(host4, dev4, size, D2H) ; 
...

來源：Streams and concurrency webinar

的通話將默認爲異步的，所以一旦內核在GPU推出，CPU將處理後面的說明。要強制執行同步，您必須使用cudaDeviceSynchronize（）或任何通過cudaMemcpy傳輸的內存傳輸，這些傳輸會自行強制進行同步。

來源：CUDA FAQ。

Q: Can the CPU and GPU run in parallel? Kernel invocation in CUDA is asynchronous, so the driver will return control to the application as soon as it has launched the kernel.

The "cudaThreadSynchronize()" API call should be used when measuring performance to ensure that all device operations have completed before stopping the timer.

CUDA functions that perform memory copies and that control graphics interoperability are synchronous, and implicitly wait for all kernels to complete.

順便說一句，如果你不需要的內核之間的同步，它們可以同時執行，如果你的GPU具有所需的計算能力（CC）：

Q: Is it possible to execute multiple kernels at the same time? Yes. GPUs of compute capability 2.x or higher support concurrent kernel execution and launches.

（仍然讀進來CUDA常見問題解答）。

來源

2016-12-08 17:05:33 Taro

它不工作：錯誤：重複的參數名稱 – wolfgunner

這是因爲你的第二個內核有兩個名爲「d」的參數，即使它們中的一個是指針。這不適用於任何編譯器。我的答案仍然代表你在問題中提出的問題。順便說一句，你似乎只是在粘貼你的錯誤，並且沒有努力去解決它們，期待我們給你的代碼提供一個無錯的版本。這是一個非常疲憊的行爲，我會在這裏停下來討論這個話題。 – Taro

你是完全正確的，我不會爲d表達意見，我會爲未來付出努力，也許我不會問愚蠢的問題......我認爲這種方式非常好。 – wolfgunner

同一個CUDA源文件中的多個全局函數

回答

相關問題