我一直在嘗試編寫一個程序,該程序使用GPU來使用高斯正交數值積分來計算積分。我一直在試圖弄清楚爲什麼這個程序不能正常工作。我想我把它固定在一個事實上,即在函數調用d_one中傳遞的參數沒有被正確地複製到cuda c代碼中。我不知道爲什麼會發生這種情況。我花了很多時間試圖弄清楚,但是我無法得到它。將錯誤地從Fortran複製到cuda c程序的變量
這裏有兩個方案:
Fortran程序:
implicit real*8(a-h,o-z)
parameter (nlinx = 22) ! Total number of mesh regions
dimension sx(3*nlinx),swx(3*nlinx)
xa = 0.d0
xb = 5.d0
! In the following "nptx" is the total number of integration
! points. So, it is (nlinx * 3)
call meshwt1(xa,xb,nlinx,ntan,sx,swx,nptx)
ans0 = 0.d0
CAll d_one(sx, swx, nptx, ans0)
print *, ans0
stop
end
SUBROUTINE MESHWT1(A,B,N,NT,X,W,NTOT)
implicit real*8(a-h,o-z)
!3*N LINEAR POINTS FOR A TO B
!NT=0 OR 1, 3*NT TAN PTS FOR B TO INFINITY
!NTOT= 3*(N+NT)
DIMENSION X(*),W(*),G(3),GW(3)
G(1) = -0.7745966
G(2) = 0.0000000
G(3) = -G(1)
GW(2) = 0.8888888
GW(1) = 0.5555555
GW(3) = GW(1)
Y = N
DX = (B - A)/Y
K = 0
XA = A - DX
XB = A
DO 2 I = 1, N
XA = XA + DX
XB = XB + DX
DO 2 J = 1, 3
K = K + 1
X(K) = 0.5 * (XA + XB) + 0.5 * (XB - XA) * G(J)
2 W(K) = 0.5 * (XB - XA) * GW(J)
NTOT = K
IF(NT .EQ. 1) GO TO 3
GO TO 5
3 NTOT = K + 3
DO 4 J = 1, 3
K = K + 1
Y = (1.0 + G(J)) * 3.14159 * 0.25
X(K) = XB + DTAN(Y)
4 W(K) = GW(J) * 3.14159 * 0.25/(DCOS(Y)) ** 2
5 CONTINUE
RETURN
END
的CUDA程序:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cuda.h>
#include <cuda_runtime.h>
__global__ void loop_d(float *a, float *b, int N, float *ans)
{
__shared__ float temp[66];
int idx = threadIdx.x;
if (idx < 66)
{
temp[idx] = a[idx] * b[idx];
}
__syncthreads();
if (0 == idx)
{
float sum = 0.0;
for (int i=0; i < 66; i++)
{
sum += temp[i];
}
*ans = sum;
}
}
// The following function is called from the Fortran program
extern "C" void d_one_(float *a, float *b, int *Np, float *ans)
{
float *a_d, *b_d, *ans_d; // Declaring GPU Copies of the parameters passed
int blocks = 1; // Number of blocks used
int N = *Np; // Number of threads is determined by the parameter nptx passed from the Fortran program
// Allocating GPU memory
cudaMalloc((void **)&a_d, sizeof(float) * N);
cudaMalloc((void **)&b_d, sizeof(float) * N);
cudaMalloc((void **)&ans_d, sizeof(float));
// Copying information from CPU to GPU
cudaMemcpy(a_d, a, sizeof(float) * N, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b, sizeof(float) * N, cudaMemcpyHostToDevice);
cudaMemcpy(ans_d, ans, sizeof(float), cudaMemcpyHostToDevice);
// Calling the function on the GPU
loop_d<<< blocks, N >>>(a_d, b_d, N, ans_d);
cudaMemcpy(a, a_d, sizeof(float) * N, cudaMemcpyDeviceToHost);
cudaMemcpy(b, b_d, sizeof(float) * N, cudaMemcpyDeviceToHost);
cudaMemcpy(ans, ans_d, sizeof(float), cudaMemcpyDeviceToHost);
// Freeing GPU memory
cudaFree(a_d);
cudaFree(b_d);
cudaFree(ans_d);
return;
}
程序的輸出應該是12.49999。我得到了-314的答案。感謝您提供的任何輸入!
任何在21世紀使用隱式打字的人都應該得到所有的痛苦,這些打擊他們的收費器。這可能不是您報告的錯誤的來源,但修改程序以消除錯誤來源的可能性只需幾秒鐘。 –
完全同意HPM。我喜歡FORTRAN的原因是因爲任何代碼應該以'IMPLICIT NONE'開頭。一般來說,Imho是有史以來最好的座右銘。 – user463035818
至少他正在使用一個明確的'隱式'語句,這使得它顯而易見地表明他正試圖將'real * 8'變量傳遞給'float'。 – tera