CUDA雙矩陣溢出

我寫了一個程序，使給定矩陣的元素加倍，如果我將矩陣大小更改爲500，由於溢出會「停止工作」，人們可以幫我理解爲什麼？（它的工作原理罰款100）CUDA雙矩陣溢出

#include "cuda_runtime.h" 
#include "device_launch_parameters.h" 

#include <stdio.h> 
#include <stdlib.h> 
__global__ void kernel_double(int *c, int *a) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    c[i] = a[i] * 2; 
} 
int main() 
{ 
    const int size = 100; 
    // failed when size = 500, Unhandled exception at 0x0in 
    // doublify.exe: 0xC00000FD: 
    // Stack overflow (parameters: 0x00000000, 0x00602000). 
    int a[size][size], c[size][size]; 
    int sum_a = 0; 
    int sum_c = 0; 

    for (int i = 0; i < size; i++) { 
     for (int j = 0; j < size; j++) { 
      a[i][j] = rand() % 10; 
      sum_a += a[i][j]; 
     } 
    } 
    printf("sum of matrix a is %d \n", sum_a); 

    int *dev_a = 0; 
    int *dev_c = 0; 
    cudaMalloc((void**)&dev_c, size * size * sizeof(int)); 
    cudaMalloc((void**)&dev_a, size * size * sizeof(int)); 
    cudaMemcpy(dev_a, a, size * size * sizeof(int), cudaMemcpyHostToDevice); 
    printf("grid size %d \n", int(size * size/1024) + 1); 
    kernel_double << <int(size * size/1024) + 1, 1024 >> >(dev_c, dev_a); 
    cudaDeviceSynchronize(); 
    cudaMemcpy(c, dev_c, size * size * sizeof(int), cudaMemcpyDeviceToHost); 
    cudaFree(dev_c); 
    cudaFree(dev_a); 
    for (int i = 0; i < size; i++) { 
     for (int j = 0; j < size; j++) { 
      sum_c += c[i][j]; 
     } 
    } 
    printf("sum of matrix c is %d \n", sum_c); 
    return 0; 
}

這裏是輸出時的大小等於100：

sum of matrix a is 44949 
grid size 10 
sum of matrix c is 89898 
Press any key to continue . . .

我的開發環境是MSVS2015 V14，CUDA8.0和GTX1050Ti

來源

2016-12-12 B.Mr.W.

你」重新獲得大小爲500的堆棧溢出，因爲您聲明瞭每個具有250,000個元素的2個局部變量數組。這可以達到大約2MB的堆棧空間。

您可能會提供一個鏈接器選項來增加初始堆棧大小，但更好的解決方案會動態地爲您的陣列分配空間。（您可以創建在他們的數組中的類，然後就分配這個類的一個實例。）

例如，在你main功能添加一個新的結構：

struct mats { 
    int a[size][size]; 
    int c[size][size]; 
};

然後，在你main ，取出a和c陣列，並與

auto ary = std::make_unique<mats>();

取代它無處不在，你引用a或c，使用ary->a和改爲ary->c。（當ary超出範圍時，unique_ptr會自動刪除分配的內存。）

來源

2016-12-12 04:00:01 1201ProgramAlarm

您能否詳細介紹一下這個動態分配方面，比如可能提供一些示例代碼？我是C++新手，所以更多的細節將非常感謝！ –

如果你想創建一個連續的二維動態數組（我相信你需要它是這樣的）[看這裏]（http://stackoverflow.com/questions/21943621/how-to-create-a-contiguous -2d陣列式-C/21944048＃21944048）。將分配從'new []'更改爲CUDA分配函數。 – PaulMcKenzie

@ B.Mr.W。示例代碼已添加。 – 1201ProgramAlarm

CUDA雙矩陣溢出

回答

相關問題