0
我寫了一個程序,使給定矩陣的元素加倍,如果我將矩陣大小更改爲500,由於溢出會「停止工作」,人們可以幫我理解爲什麼? (它的工作原理罰款100)CUDA雙矩陣溢出
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
__global__ void kernel_double(int *c, int *a)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
c[i] = a[i] * 2;
}
int main()
{
const int size = 100;
// failed when size = 500, Unhandled exception at 0x0in
// doublify.exe: 0xC00000FD:
// Stack overflow (parameters: 0x00000000, 0x00602000).
int a[size][size], c[size][size];
int sum_a = 0;
int sum_c = 0;
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
a[i][j] = rand() % 10;
sum_a += a[i][j];
}
}
printf("sum of matrix a is %d \n", sum_a);
int *dev_a = 0;
int *dev_c = 0;
cudaMalloc((void**)&dev_c, size * size * sizeof(int));
cudaMalloc((void**)&dev_a, size * size * sizeof(int));
cudaMemcpy(dev_a, a, size * size * sizeof(int), cudaMemcpyHostToDevice);
printf("grid size %d \n", int(size * size/1024) + 1);
kernel_double << <int(size * size/1024) + 1, 1024 >> >(dev_c, dev_a);
cudaDeviceSynchronize();
cudaMemcpy(c, dev_c, size * size * sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(dev_c);
cudaFree(dev_a);
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
sum_c += c[i][j];
}
}
printf("sum of matrix c is %d \n", sum_c);
return 0;
}
這裏是輸出時的大小等於100:
sum of matrix a is 44949
grid size 10
sum of matrix c is 89898
Press any key to continue . . .
我的開發環境是MSVS2015 V14,CUDA8.0和GTX1050Ti
您能否詳細介紹一下這個動態分配方面,比如可能提供一些示例代碼?我是C++新手,所以更多的細節將非常感謝! –
如果你想創建一個連續的二維動態數組(我相信你需要它是這樣的)[看這裏](http://stackoverflow.com/questions/21943621/how-to-create-a-contiguous -2d陣列式-C/21944048#21944048)。將分配從'new []'更改爲CUDA分配函數。 – PaulMcKenzie
@ B.Mr.W。示例代碼已添加。 – 1201ProgramAlarm