0
以下是參考我的工作代碼:如何釋放GPU內存並在Pyopencl中爲不同的陣列使用相同的緩衝區?
vector = numpy.array([1, 2, 4, 8], numpy.float32) #cl.array.vec.float4
matrix = numpy.zeros((1, 4), cl.array.vec.float4)
matrix[0, 0] = (1, 2, 4, 8)
matrix[0, 1] = (16, 32, 64, 128)
matrix[0, 2] = (3, 6, 9, 12)
matrix[0, 3] = (5, 10, 15, 25)
# vector[0] = (1, 2, 4, 8)
platform=cl.get_platforms() #gets all platforms that exist on this machine
device=platform[0].get_devices(device_type=cl.device_type.GPU) #gets all GPU's that exist on first platform from platform list
context=cl.Context(devices=[device[0]]) #Creates context for all devices in the list of "device" from above. context.num_devices give number of devices in this context
print("everything good so far")
program=cl.Program(context,"""
__kernel void matrix_dot_vector(__global const float4 * matrix,__global const float *vector,__global float *result)
{
int gid = get_global_id(0);
result[gid]=dot(matrix[gid],vector[0]);
}
""").build()
queue=cl.CommandQueue(context)
# queue=cl.CommandQueue(context,cl_device_id device) #Context specific to a device if we plan on using multiple GPUs for parallel processing
mem_flags = cl.mem_flags
matrix_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=matrix)
vector_buf = cl.Buffer(context, mem_flags.READ_ONLY | mem_flags.COPY_HOST_PTR, hostbuf=vector)
matrix_dot_vector = numpy.zeros(4, numpy.float32)
global_size_of_GPU= 0
destination_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, matrix_dot_vector.nbytes)
# threads_size_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, global_size_of_GPU.nbytes)
program.matrix_dot_vector(queue, matrix_dot_vector.shape, None, matrix_buf, vector_buf, destination_buf)
## Step #11. Move the kernel’s output data to host memory.
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf)
# cl.enqueue_copy(queue, global_size_of_GPU, threads_size_buf)
print(matrix_dot_vector)
# print(global_size_of_GPU)
# COPY SAME ARRAY FROM GPU AGAIN
cl.enqueue_copy(queue, matrix_dot_vector, destination_buf)
print(matrix_dot_vector)
print('copied same array twice')
- 我怎麼能免費matrix_buf & destination_buf對GPU的內存。一個是隻讀的,另一個是隻寫的。
- 如何在同一個matrix_buf中加載不同的矩陣數組,而不需要 必須在pyopencl中創建新的緩衝區。我讀到,如果我加載新的 數據在相同的緩衝區,它會快得多,然後重新創建相同大小的 緩衝區每次。
- 如果我在舊緩衝區 中加載的新陣列的大小比那個緩衝區中的舊陣列小,那麼可以。 新陣列必須具有完全相同的緩衝區大小?
u能請解釋「發佈()」和‘pyopencl.enqueue_map_buffer()’有一個例子,我試着讀你所提供的鏈接,但它的艱澀 –
看一看這裏的例子:。 [pyopencl.buffer.release](http://nullege.com/codes/search?cq=pyopencl.buffer.release)和[pyopencl.enqueue_map_buffer](http://nullege.com/codes/search?cq=pyopencl .enqueue_map_buffer) – doqtor