我有一個非常簡單的scala jcuda程序,它增加了一個非常大的數組。所有東西都編譯並運行,直到我想從設備複製超過4個字節到主機。當我嘗試複製超過4個字節時,我正在獲取CUDA_ERROR_INVALID_VALUE。cuMemcpyDtoH產生CUDA_ERROR_INVALID_VALUE
// This does pukes and gives CUDA_ERROR_INVALID_VALUE
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
8
)
// This runs just fine
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
4
)
爲了給實際程序波紋管的更好的方面是我的內核代碼編譯和運行得很好:
extern "C"
__global__ void add(int n, int *a, int *b, int *sum) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<n)
{
sum[i] = a[i] + b[i];
}
}
另外我再翻譯一些Java示例代碼到我的Scala代碼。無論如何波紋管是運行的整個程序:
package dev
import jcuda.driver.JCudaDriver._
import jcuda._
import jcuda.driver._
import jcuda.runtime._
/**
* Created by dev on 6/7/15.
*/
object TestCuda {
def init = {
JCudaDriver.setExceptionsEnabled(true)
// Input vector
// Output vector
// Load module
// Load the ptx file.
val kernelPath = "/home/dev/IdeaProjects/jniopencl/src/main/resources/kernels/JCudaVectorAddKernel30.cubin"
cuInit(0)
val device = new CUdevice
cuDeviceGet(device, 0)
val context = new CUcontext
cuCtxCreate(context, 0, device)
// Create and load module
val module = new CUmodule()
cuModuleLoad(module, kernelPath)
// Obtain a function pointer to the kernel function.
var add = new CUfunction()
cuModuleGetFunction(add, module, "add")
val numElements = 100000
val hostInputA = 1 to numElements toArray
val hostInputB = 1 to numElements toArray
val SI: Int = Sizeof.INT.asInstanceOf[Int]
// Allocate the device input data, and copy
// the host input data to the device
var deviceInputA = new CUdeviceptr
cuMemAlloc(deviceInputA, numElements * SI)
cuMemcpyHtoD(
deviceInputA,
Pointer.to(hostInputA),
numElements * SI
)
var deviceInputB = new CUdeviceptr
cuMemAlloc(deviceInputB, numElements * SI)
cuMemcpyHtoD(
deviceInputB,
Pointer.to(hostInputB),
numElements * SI
)
// Allocate device output memory
val deviceOutput = new CUdeviceptr()
cuMemAlloc(deviceOutput, SI)
// Set up the kernel parameters: A pointer to an array
// of pointers which point to the actual values.
val kernelParameters = Pointer.to(
Pointer.to(Array[Int](numElements)),
Pointer.to(deviceInputA),
Pointer.to(deviceInputB),
Pointer.to(deviceOutput)
)
// Call the kernel function
val blockSizeX = 256
val gridSizeX = Math.ceil(numElements/blockSizeX).asInstanceOf[Int]
cuLaunchKernel(
add,
gridSizeX, 1, 1,
blockSizeX, 1, 1,
0, null,
kernelParameters, null
)
cuCtxSynchronize
// **** Code pukes here with that error
// If I comment this out the program runs fine
var hostOutput = new Array[Int](numElements)
cuMemcpyDtoH(
Pointer.to(hostOutput),
deviceOutput,
numElements
)
hostOutput.foreach(print(_))
}
}
無論如何,只是爲了讓你知道我的電腦的規格。我使用的是具有計算3.0功能的GTX 770M卡,在Optimus上運行Ubuntu 14.04。我也運行NVCC版本5.5。最後,我使用Java 8運行scala版本2.11.6。我是一個noob,非常感謝任何幫助。
我不知道該怎麼做[CUDA錯誤檢查](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the -cuda-runtime-api/14038590#14038590)在jcuda中,但我想你應該在每個CUDA API調用後檢查錯誤,以確保你確定錯誤真正發生的正確位置 –
@ms當設置了'setExceptionsEnabled(true)'時,會自動檢查基本的錯誤檢查(即函數返回值)。 – Marco13