在alea GPU上使用cuBLAS的矩陣乘法

我想在Alea GPU上使用Gemm進行矩陣乘法，但是，這段代碼給出了錯誤的結果。在alea GPU上使用cuBLAS的矩陣乘法

Gpu gpu = Gpu.Default; 
Blas blas = new Blas(gpu); 

int m=2,n=3; //in dimension and out dimension (output will be mxn matrix) 
int k=4; 

//column major 
float[,] A = new float[4,2] { {100,200},{2,6},{3,7},{4,8} }; //2x4 matrix 
float[,] B = new float[3,4] { {1,4,7,10}, {2,5,8,11}, {3,6,9,12} }; //4x3 matrix 
float[,] C = new float[3,2] { {-1,-1}, {-1,-1}, {-1,-1} }; //2x3 matrix 

var dA = gpu.AllocateDevice<float>(A); 
var dB = gpu.AllocateDevice<float>(B); 
var dC = gpu.AllocateDevice<float>(C); 

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,m,dB.Ptr,k,0f,dC.Ptr,m); 

var result = Gpu.Copy2DToHost(dC);

這是我得到的結果。它只是從矩陣A複製一些數字。矩陣C中的一些數字不會從初始化中改變。

100 -1 -1 
200 -1 -1

這有什麼錯的代碼？請幫忙。

我使用alea 3.0.3和cuda toolkit 8.0。

UPDATE1：我發現它給出了正確的結果，當我把A，B，C矩陣變成一維數組時。但是，仍然想知道二維數組有什麼問題。

來源

2017-09-04 koonyook

我發現gpu.AllocateDevice for 2D-Array不會像在CPU上那樣分配GPU上的空間。任何2個連續列（間距）的第一個元素之間的距離驚人地大。

因此，必須更改主要尺寸參數。

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,dA.PitchInElements.ToInt32(),dB.Ptr,dB.PitchInElements.ToInt32(),0f,dC.Ptr,dC.PitchInElements.ToInt32());

現在，我得到了正確的結果。然而，是否有任何文件顯示如何在GPU上分配2D陣列真的在Alea中工作？

我只能看到http://www.aleagpu.com/release/3_0_3/api/html/6f0dc687-7191-91ba-6c30-bb379dded567.htm沒有任何解釋。

來源

2017-09-05 11:05:43 koonyook

最有可能的是，它使用[cudaMallocPitch]（http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g32bd7a39135594788a542ae72217775c）。音調的原因是將矩陣行與物理內存通道對齊，以便在某些內核中獲得更好的性能。 –

在alea GPU上使用cuBLAS的矩陣乘法

回答

相關問題