2011-02-17 88 views
15

我是CUDA的新手。如何分配MXN大小的二維數組?如何在CUDA中遍歷該數組?給我一個示例代碼。 ................................................. ...........................................如何在CUDA中使用2D陣列?

嗨..感謝您的回覆。我在以下程序中使用了您的代碼。但我沒有得到正確的結果。

__global__ void test(int A[BLOCK_SIZE][BLOCK_SIZE], int B[BLOCK_SIZE][BLOCK_SIZE],int C[BLOCK_SIZE][BLOCK_SIZE]) 
{ 

    int i = blockIdx.y * blockDim.y + threadIdx.y; 
    int j = blockIdx.x * blockDim.x + threadIdx.x; 

    if (i < BLOCK_SIZE && j < BLOCK_SIZE) 
     C[i][j] = A[i][j] + B[i][j]; 

} 

int main() 
{ 

    int d_A[BLOCK_SIZE][BLOCK_SIZE]; 
    int d_B[BLOCK_SIZE][BLOCK_SIZE]; 
    int d_C[BLOCK_SIZE][BLOCK_SIZE]; 

    int C[BLOCK_SIZE][BLOCK_SIZE]; 

    for(int i=0;i<BLOCK_SIZE;i++) 
     for(int j=0;j<BLOCK_SIZE;j++) 
     { 
     d_A[i][j]=i+j; 
     d_B[i][j]=i+j; 
     } 


    dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE); 
    dim3 dimGrid(GRID_SIZE, GRID_SIZE); 

    test<<<dimGrid, dimBlock>>>(d_A,d_B,d_C); 

    cudaMemcpy(C,d_C,BLOCK_SIZE*BLOCK_SIZE , cudaMemcpyDeviceToHost); 

    for(int i=0;i<BLOCK_SIZE;i++) 
     for(int j=0;j<BLOCK_SIZE;j++) 
     { 
     printf("%d\n",C[i][j]); 

     } 
} 

請幫幫我。

+16

你可能會更有禮貌,它不會傷害你。 – karlphillip 2011-02-17 14:28:59

+1

你不能用cudaMemcpy取回二維數組的值,而必須使用cudaMallocPitch或cudaPitchPtr與cudaMalloc3D,因爲@Dave表示 – ardiyu07 2011-02-17 17:26:35

回答

16

如何分配二維數組:

int main(){ 
#define BLOCK_SIZE 16 
#define GRID_SIZE 1 
int d_A[BLOCK_SIZE][BLOCK_SIZE]; 
int d_B[BLOCK_SIZE][BLOCK_SIZE]; 

/* d_A initialization */ 

dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE); // so your threads are BLOCK_SIZE*BLOCK_SIZE, 256 in this case 
dim3 dimGrid(GRID_SIZE, GRID_SIZE); // 1*1 blocks in a grid 

YourKernel<<<dimGrid, dimBlock>>>(d_A,d_B); //Kernel invocation 
} 

如何遍歷數組:

__global__ void YourKernel(int d_A[BLOCK_SIZE][BLOCK_SIZE], int d_B[BLOCK_SIZE][BLOCK_SIZE]){ 
int row = blockIdx.y * blockDim.y + threadIdx.y; 
int col = blockIdx.x * blockDim.x + threadIdx.x; 
if (row >= h || col >= w)return; 
/* whatever you wanna do with d_A[][] and d_B[][] */ 
} 

我希望這是有益

,你也可以參考22 CUDA Programming Guide頁約矩陣乘法

4

最好的方法是存儲一個雙精度數據,以其向量形式的二維數組A. 例如,你有一個矩陣A的大小n×m個,和它的(I,J)元件中的指針的指針表示將

A[i][j] (with i=0..n-1 and j=0..m-1). 

在矢量形式可以寫

A[i*n+j] (with i=0..n-1 and j=0..m-1). 

使用單在這種情況下,三維陣列將簡化複製過程,這將是簡單的:

double *A,*dev_A; //A-hous pointer, dev_A - device pointer; 
A=(double*)malloc(n*m*sizeof(double)); 
cudaMalloc((void**)&dev_A,n*m*sizeof(double)); 
cudaMemcpy(&dev_A,&A,n*m*sizeof(double),cudaMemcpyHostToDevice); //In case if A is double