包含動態並行性的編譯代碼失敗

我正在使用CUDA 5.5和計算能力爲3.5的NVDIA GeForce GTX 780進行動態並行編程。我在調用內核函數內核函數，但它給我一個錯誤：包含動態並行性的編譯代碼失敗

error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above

我在做什麼錯？

來源

2013-10-10 user2865500

您需要讓nvcc爲您的設備生成CC 3.5代碼。這可以通過將此選項添加到nvcc命令行來完成。

-gencode arch=compute_35,code=sm_35

您可以在動態並行性上找到CUDA示例以獲取更多詳細信息。它們包含所有支持的操作系統的命令行選項和項目設置。

http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-

來源

2013-10-10 05:11:13 kangshiyin

要使用動態並行'--relocatable設備代碼= TRUE;或短'-rdc'也是需要的。另外爲了防止更多的錯誤，不要忘記鏈接到cudadevrt庫。 –

你可以做這樣的事情

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt

或

如果你有2個文件simple1.cu和test.c的，那麼你可以按照以下做一些事情。這被稱爲獨立編譯。

nvcc -arch=sm_35 -dc simple1.cu 
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt 
g++ -c test.c 
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart

同樣在cuda programming guide

來源

2013-10-10 08:11:26

嗨，感謝您的回覆我這樣做，但得到這個錯誤致命錯誤：nvcc支持'--relocatable-device-code = true（-rdc = true）'，'--device -c（-dc）'，和' - 設備鏈接（-dlink）'只有當目標是sm_20或更高 – user2865500

請問附上你到底在做什麼？簡單的程序和你正在運行的命令？因爲我只是嘗試了一個簡單的程序[這裏]（http://pastebin.com/3Z2aGa4F）上述命令 –

並且不要添加您的意見作爲答案請 –

說明在Visual Studio 2010：

1) View -> Property Pages 
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true) 
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35 
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib

來源

2013-10-10 09:33:13 JackOLantern

我非常感謝你的幫助..我已經做到了這一點，現在我得到這個錯誤。請不要白白浪費... nvcc：致命錯誤：nvcc支持'--relocatable-device-code = true（-rdc = true）'，'--device-c（-dc）'和'--device-link（-dlink）'僅適用於目標sm_20或更高 – user2865500

確保您已正確完成步驟3。是否能夠成功編譯cdpLUDecomposition CUDA示例？它使用動態並行性來計算LU分解。 – JackOLantern

是的，先生。我構建了cdpLUDeomposition CUDA示例。它不會給出任何錯誤。但給我這個本地'已退出代碼0（0x0）。我已成功完成所有步驟。 – user2865500

包含動態並行性的編譯代碼失敗

回答

相關問題