cuMemcpyDtoH產生CUDA_ERROR_INVALID_VALUE

我有一個非常簡單的scala jcuda程序，它增加了一個非常大的數組。所有東西都編譯並運行，直到我想從設備複製超過4個字節到主機。當我嘗試複製超過4個字節時，我正在獲取CUDA_ERROR_INVALID_VALUE。cuMemcpyDtoH產生CUDA_ERROR_INVALID_VALUE

// This does pukes and gives CUDA_ERROR_INVALID_VALUE 
var hostOutput = new Array[Int](numElements) 
cuMemcpyDtoH(
    Pointer.to(hostOutput), 
    deviceOutput, 
    8 
) 

// This runs just fine 
var hostOutput = new Array[Int](numElements) 
cuMemcpyDtoH(
    Pointer.to(hostOutput), 
    deviceOutput, 
    4 
)

爲了給實際程序波紋管的更好的方面是我的內核代碼編譯和運行得很好：

extern "C" 
__global__ void add(int n, int *a, int *b, int *sum) { 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    if (i<n) 
    { 
     sum[i] = a[i] + b[i]; 
    } 
}

另外我再翻譯一些Java示例代碼到我的Scala代碼。無論如何波紋管是運行的整個程序：

package dev 

import jcuda.driver.JCudaDriver._ 

import jcuda._ 
import jcuda.driver._ 
import jcuda.runtime._ 

/** 
* Created by dev on 6/7/15. 
*/ 
object TestCuda { 
    def init = { 
    JCudaDriver.setExceptionsEnabled(true) 

    // Input vector 

    // Output vector 

    // Load module 
    // Load the ptx file. 

    val kernelPath = "/home/dev/IdeaProjects/jniopencl/src/main/resources/kernels/JCudaVectorAddKernel30.cubin" 

    cuInit(0) 

    val device = new CUdevice 
    cuDeviceGet(device, 0) 
    val context = new CUcontext 
    cuCtxCreate(context, 0, device) 

    // Create and load module 
    val module = new CUmodule() 
    cuModuleLoad(module, kernelPath) 

    // Obtain a function pointer to the kernel function. 
    var add = new CUfunction() 
    cuModuleGetFunction(add, module, "add") 

    val numElements = 100000 

    val hostInputA = 1 to numElements toArray 
    val hostInputB = 1 to numElements toArray 
    val SI: Int = Sizeof.INT.asInstanceOf[Int] 

    // Allocate the device input data, and copy 
    // the host input data to the device 
    var deviceInputA = new CUdeviceptr 
    cuMemAlloc(deviceInputA, numElements * SI) 
    cuMemcpyHtoD(
     deviceInputA, 
     Pointer.to(hostInputA), 
     numElements * SI 
    ) 

    var deviceInputB = new CUdeviceptr 
    cuMemAlloc(deviceInputB, numElements * SI) 
    cuMemcpyHtoD(
     deviceInputB, 
     Pointer.to(hostInputB), 
     numElements * SI 
    ) 

    // Allocate device output memory 
    val deviceOutput = new CUdeviceptr() 
    cuMemAlloc(deviceOutput, SI) 

    // Set up the kernel parameters: A pointer to an array 
    // of pointers which point to the actual values. 
    val kernelParameters = Pointer.to(
     Pointer.to(Array[Int](numElements)), 
     Pointer.to(deviceInputA), 
     Pointer.to(deviceInputB), 
     Pointer.to(deviceOutput) 
    ) 

    // Call the kernel function 
    val blockSizeX = 256 
    val gridSizeX = Math.ceil(numElements/blockSizeX).asInstanceOf[Int] 
    cuLaunchKernel(
     add, 
     gridSizeX, 1, 1, 
     blockSizeX, 1, 1, 
     0, null, 
     kernelParameters, null 
    ) 

    cuCtxSynchronize 

    // **** Code pukes here with that error 
    // If I comment this out the program runs fine 
    var hostOutput = new Array[Int](numElements) 
    cuMemcpyDtoH(
     Pointer.to(hostOutput), 
     deviceOutput, 
     numElements 
    ) 

    hostOutput.foreach(print(_)) 
    } 
}

無論如何，只是爲了讓你知道我的電腦的規格。我使用的是具有計算3.0功能的GTX 770M卡，在Optimus上運行Ubuntu 14.04。我也運行NVCC版本5.5。最後，我使用Java 8運行scala版本2.11.6。我是一個noob，非常感謝任何幫助。

來源

2015-06-12 Dr.Knowitall

我不知道該怎麼做[CUDA錯誤檢查]（http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the -cuda-runtime-api/14038590＃14038590）在jcuda中，但我想你應該在每個CUDA API調用後檢查錯誤，以確保你確定錯誤真正發生的正確位置 –

@ms當設置了'setExceptionsEnabled（true）'時，會自動檢查基本的錯誤檢查（即函數返回值）。 – Marco13

這裏

val deviceOutput = new CUdeviceptr() 
cuMemAlloc(deviceOutput, SI)

你分配SI字節 - 這是4個字節，作爲一個int的長度。向這個設備指針寫入超過4個字節會弄亂一些東西。它應該是

cuMemAlloc(deviceOutput, SI * numElements)

同樣，我認爲有問題的電話應該是

（注意* SI最後一個參數）。

來源

2015-06-12 08:09:19 Marco13

啊，非常感謝你！忽視這些事情太容易了。我打算開始使用cuda gdb，但是有沒有一種調試本地主機代碼的好方法？ –

@ Mr.Student事實上，API層最重要的是'setExceptionsEnabled（true）'。在這種情況下，它已經指向了正確的路線，仔細檢查這些參數會發現錯誤（我不確定一個工具在多大程度上可以幫助超出這個範圍......）。調試內核是一個不同的故事（不幸的是，Java/JCuda比使用本地CUDA（NVIDIA NSight）可用的工具困難得多 - 另請參見http://www.jcuda.org/debugging/Debugging.html） – Marco13

謝謝我感謝Marco的所有幫助 –

cuMemcpyDtoH產生CUDA_ERROR_INVALID_VALUE

回答

相關問題