使用cudaDeviceReset()計算後正常使用Matlab的GPU的方式?我無法在最新版本的Matlab中使用GPU計算,因爲我的GPU不支持Compute Capability 1.3+,並且我不希望爲Accelereyes Jacket使用像cudaMemGetInfo()這樣的簡單Cuda函數支付大量資金,或我簡單的Cuda內核。由於CUcontext緩存,Matlab是否會導致Cuda泄漏內存?
從Matlab調用Cuda時,我發現了一些非常令人沮喪的行爲。在Visual Studio 2008中,我編寫了一個簡單的DLL,它使用標準的MEX接口運行一個Cuda查詢:設備上有多少RAM可用(清單1)。
// cudaMemoryCheck.cpp : Defines the exported functions for the DLL application.
#include <mex.h>
#include <cuda.h>
#include <driver_types.h>
#include <cuda_runtime_api.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
size_t free = 0, total = 0;
cudaError_t result = cudaMemGetInfo(&free, &total);
mexPrintf("free memory in bytes %u (%u MB), total memory in bytes %u (%u MB). ", free, free/1024/1024, total, total/1024/1024);
if(total > 0)
mexPrintf("%2.2f%% free\n", (100.0*free)/total);
else
mexPrintf("\n");
// this is the critical line!
cudaDeviceReset();
}
我編譯,我使用DEF文件導出mexFunction項目一個Win32 DLL(釋放模式),並重新命名的DLL文件擴展名.mexw32。
當我從Matlab運行cudaMemoryCheck時,如果cudaDeviceReset()被註釋掉,我發現我的GPU會泄漏內存。這裏是我瑣碎的Matlab代碼(清單2):
addpath('C:\Users\admin\Documents\Visual Studio 2008\Projects\cudaMemoryCheck\Release')
for i=1:20
clear mex
cudaMemoryCheck;
end
運行在Matlab這個功能,我看到:
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
free memory in bytes 57393152 (54 MB), total memory in bytes 244776960 (233 MB). 23.45% free
從MATLAB的輸出是非常不同的,當cudaDeviceReset()被註釋掉:
free memory in bytes 37019648 (35 MB), total memory in bytes 244776960 (233 MB). 15.12% free
free memory in bytes 25092096 (23 MB), total memory in bytes 244776960 (233 MB). 10.25% free
free memory in bytes 13549568 (12 MB), total memory in bytes 244776960 (233 MB). 5.54% free
free memory in bytes 12107776 (11 MB), total memory in bytes 244776960 (233 MB). 4.95% free
free memory in bytes 8568832 (8 MB), total memory in bytes 244776960 (233 MB). 3.50% free
free memory in bytes 9617408 (9 MB), total memory in bytes 244776960 (233 MB). 3.93% free
free memory in bytes 6078464 (5 MB), total memory in bytes 244776960 (233 MB). 2.48% free
free memory in bytes 8044544 (7 MB), total memory in bytes 244776960 (233 MB). 3.29% free
free memory in bytes 5816320 (5 MB), total memory in bytes 244776960 (233 MB). 2.38% free
free memory in bytes 7520256 (7 MB), total memory in bytes 244776960 (233 MB). 3.07% free
free memory in bytes 8830976 (8 MB), total memory in bytes 244776960 (233 MB). 3.61% free
free memory in bytes 5292032 (5 MB), total memory in bytes 244776960 (233 MB). 2.16% free
free memory in bytes 3407872 (3 MB), total memory in bytes 244776960 (233 MB). 1.39% free
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
free memory in bytes 0 (0 MB), total memory in bytes 0 (0 MB).
所以我得出結論,即使我的MEX函數沒有在GPU上分配內存,Cuda運行時API每次運行MEX函數時都會創建新的CUcontexts,並且它永遠不會清除直到我關閉Matlab或我使用cudaDeviceReset()。儘管事實上我沒有分配任何內容,但最終GPU耗盡內存!
我不喜歡使用cudaDeviceReset()。 API說:「函數cudaDeviceReset()將立即爲調用線程的當前設備取消初始化上下文初始化」和「調用者有責任確保此函數在設備未被進程中的任何其他主機線程訪問時叫做。」換句話說,使用cudaDeviceReset()可以立即終止其他GPU計算,而不會發出警告。我還沒有找到任何經常使用cudaDeviceReset()的文檔是正常的,所以我不想這樣做。我會接受任何答案,證明使用cudaDeviceReset()是正常的和必需的。版本信息:NVIDIA GPU Computing Toolkit 4.0,Matlab 7.8.0(R2009a,32位),Windows 7 Enterprise SP1(64位),Nvidia Quadro NVS 420(最新的Nvidia驅動程序,270.81)。
我也可以在Windows XP(32位,SP3)上用GeForce 8400 GS,Matlab,Visual Studio和GPU Computing Toolkit重現這個問題。 deviceQuery.exe的
輸出:
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Found 2 CUDA Capable device(s)
Device 0: "Quadro NVS 420"
CUDA Driver Version/Runtime Version 4.0/4.0
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 233 MBytes (244776960 bytes)
(1) Multiprocessors x (8) CUDA Cores/MP: 8 CUDA Cores
GPU Clock Speed: 1.40 GHz
Memory Clock rate: 700.00 Mhz
Memory Bus Width: 64-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: No with 0 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID/PCI location ID: 3/0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "Quadro NVS 420"
CUDA Driver Version/Runtime Version 4.0/4.0
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 234 MBytes (244908032 bytes)
(1) Multiprocessors x (8) CUDA Cores/MP: 8 CUDA Cores
GPU Clock Speed: 1.40 GHz
Memory Clock rate: 700.00 Mhz
Memory Bus Width: 64-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: No with 0 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID/PCI location ID: 4/0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 2, Device = Quadro NVS 420, Device = Quadro NVS 420
不調用清除mex確實會消除內存泄漏,但它不會告訴我爲什麼Matlab正在打開cuContexts。當DLL卸載時它們應該被銷燬!即使使用mexAtExit也不能修復它。它看起來像Matlab進程本身必須退出來摧毀它們,這是令人沮喪的。 – user244795
您是否嘗試過運行'version -modules'來查看在調用'clear mex'後仍然在內存中有哪些DLL? – Edric
@Edric:+1一個有用的(無證)功能,感謝分享此提示..可能對此[其他問題]有用(http://stackoverflow.com/questions/7012408/mex-function-not-updated-之後重新編譯) – Amro