2017-07-25 75 views
1

我有使用PyOpenCL添加多維數組的代碼。我的問題是,除了第一個維度外,結果都是錯誤的。我一直在諮詢這LinkPyOpenCL多維數組

from __future__ import absolute_import, print_function 
import numpy as np 
import pyopencl as cl 

N = 4 
a_np = np.random.rand(N,N).astype(np.float32) 
b_np = np.random.rand(N,N).astype(np.float32) 

ctx = cl.create_some_context() 
queue = cl.CommandQueue(ctx) 

mf = cl.mem_flags 
a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np) 
b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np) 

prg = cl.Program(ctx, """ 
    __kernel void sum(
     __global const float *a_g, __global const float *b_g, __global float *res_g) { 
      int i = get_global_id(1); 
      int j = get_global_id(0); 
      res_g[i,j] = a_g[i,j] + b_g[i,j]; 
    } 
""").build() 

res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes) 
prg.sum(queue, a_np.shape, None, a_g, b_g, res_g) 

res_np = np.empty_like(a_np) 
cl.enqueue_copy(queue, res_np, res_g) 

# Check on CPU with Numpy: 
print(res_np - (a_np + b_np)) 
print(np.linalg.norm(res_np - (a_np + b_np))) 
print (res_np) 
print (a_np + b_np) 

矩陣1:

[[ 0.2990678 0.76585543 0.71866363 0.30202991] 
[ 0.20604192 0.01989171 0.02402978 0.82826865] 
[ 0.75456071 0.62410605 0.4374246 0.85372066] 
[ 0.37000021 0.5734672 0.4250721 0.2456535 ]] 

矩陣2:

[[ 0.83109927 0.53289926 0.24182947 0.39531609] 
[ 0.53014964 0.62028325 0.2397541 0.03364789] 
[ 0.83543158 0.1162187 0.21168791 0.22438531] 
[ 0.2178313 0.76118374 0.23737679 0.41660839]] 

預期結果:

[[ 1.13016701 1.29875469 0.96049309 0.69734597] 
[ 0.73619157 0.64017498 0.26378387 0.86191654] 
[ 1.58999228 0.74032474 0.64911252 1.07810593] 
[ 0.5878315 1.33465099 0.66244888 0.6622619 ]] 

腳本結果:

[[ 1.13016701 1.29875469 0.96049309 0.69734597] 
[ 0.   0.   0.   0.  ] 
[ 0.   0.   0.   0.  ] 
[ 0.   0.   0.   0.  ]] 

回答

2

的問題就在這裏:

res_g[i,j] = a_g[i,j] + b_g[i,j];

這不是你如何訪問多維數組的元素在OpenCLOpenCLC和維基百科的一個子集:

在C和C++編程語言中,逗號操作者 (令牌表示,)是評價其 第一個操作數和丟棄結果的二進制運算符,然後評估第二個操作數並返回此值(和類型)。

因此有效地被評估爲: res_g[j] = a_g[j] + b_g[j];

因此正確的,這應該是這樣的:

res[i + size * j] = ...

請教您再次提供的鏈接,一切都在那裏。