我目前正在研究一個必須實現2D-FFT(用於交叉關聯)的程序。我用CUDA做了一次FFT,它給了我正確的結果,我現在正在試圖實現一個2D版本。在線上很少的例子和文檔,我發現很難找出錯誤是什麼。CUDA套箍2D示例
到目前爲止,我一直只使用cuFFT手冊。
無論如何,我已經創建了兩個5x5陣列,並填充1。我已經將它們複製到GPU存儲器中,並完成了前向FFT,將它們相乘,然後對結果進行ifft處理。這給了我一個值爲650的5x5陣列。我期望在5x5陣列中的一個插槽中得到值爲25的DC信號。相反,我在整個陣列中獲得了650個。
此外,我不允許在將信號複製到GPU內存後打印出信號的值。寫作
cout << d_signal[1].x << endl;
給我一個acces侵犯。我在其他cuda程序中也做了同樣的事情,但這不是問題。它與複雜變量的工作方式有關,還是人爲錯誤?
如果任何人有任何問題的指針,我將不勝感激。下面是代碼
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <helper_functions.h>
#include <helper_cuda.h>
#include <ctime>
#include <time.h>
#include <stdio.h>
#include <iostream>
#include <math.h>
#include <cufft.h>
#include <fstream>
using namespace std;
typedef float2 Complex;
__global__ void ComplexMUL(Complex *a, Complex *b)
{
int i = threadIdx.x;
a[i].x = a[i].x * b[i].x - a[i].y*b[i].y;
a[i].y = a[i].x * b[i].y + a[i].y*b[i].x;
}
int main()
{
int N = 5;
int SIZE = N*N;
Complex *fg = new Complex[SIZE];
for (int i = 0; i < SIZE; i++){
fg[i].x = 1;
fg[i].y = 0;
}
Complex *fig = new Complex[SIZE];
for (int i = 0; i < SIZE; i++){
fig[i].x = 1; //
fig[i].y = 0;
}
for (int i = 0; i < 24; i=i+5)
{
cout << fg[i].x << " " << fg[i + 1].x << " " << fg[i + 2].x << " " << fg[i + 3].x << " " << fg[i + 4].x << endl;
}
cout << "----------------" << endl;
for (int i = 0; i < 24; i = i + 5)
{
cout << fig[i].x << " " << fig[i + 1].x << " " << fig[i + 2].x << " " << fig[i + 3].x << " " << fig[i + 4].x << endl;
}
cout << "----------------" << endl;
int mem_size = sizeof(Complex)* SIZE;
cufftComplex *d_signal;
checkCudaErrors(cudaMalloc((void **) &d_signal, mem_size));
checkCudaErrors(cudaMemcpy(d_signal, fg, mem_size, cudaMemcpyHostToDevice));
cufftComplex *d_filter_kernel;
checkCudaErrors(cudaMalloc((void **)&d_filter_kernel, mem_size));
checkCudaErrors(cudaMemcpy(d_filter_kernel, fig, mem_size, cudaMemcpyHostToDevice));
// cout << d_signal[1].x << endl;
// CUFFT plan
cufftHandle plan;
cufftPlan2d(&plan, N, N, CUFFT_C2C);
// Transform signal and filter
printf("Transforming signal cufftExecR2C\n");
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_FORWARD);
cufftExecC2C(plan, (cufftComplex *)d_filter_kernel, (cufftComplex *)d_filter_kernel, CUFFT_FORWARD);
printf("Launching Complex multiplication<<< >>>\n");
ComplexMUL <<< 32, 256 >> >(d_signal, d_filter_kernel);
// Transform signal back
printf("Transforming signal back cufftExecC2C\n");
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_INVERSE);
Complex *result = new Complex[SIZE];
cudaMemcpy(result, d_signal, sizeof(Complex)*SIZE, cudaMemcpyDeviceToHost);
for (int i = 0; i < SIZE; i=i+5)
{
cout << result[i].x << " " << result[i + 1].x << " " << result[i + 2].x << " " << result[i + 3].x << " " << result[i + 4].x << endl;
}
delete result, fg, fig;
cufftDestroy(plan);
//cufftDestroy(plan2);
cudaFree(d_signal);
cudaFree(d_filter_kernel);
}
上面的代碼給出以下端子輸出:
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
----------------
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
----------------
Transforming signal cufftExecR2C
Launching Complex multiplication<<< >>>
Transforming signal back cufftExecC2C
625 625 625 625 625
625 625 625 625 625
625 625 625 625 625
625 625 625 625 625
625 625 625 625 625
您發佈的代碼是不完整的,無法編譯。你能解決這個問題嗎?如果不編譯和運行代碼很難告訴你什麼可能是錯誤的,我現在不能這麼做 - – talonmies
當然,我有一些我不想包括的未註釋的部分。我已經刪除它,並將所有內容編輯到我的帖子中。 – LukaK