2013-09-24 21 views
0

比方說,我有這個__device__功能:傳遞給設備函數的共享內存地址仍然是共享內存?

__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ 
    return array_sh + params; 
} 

而且__global__內核中我使用了這種方式:

uarray = dev_kernel (uarray, params); 

哪裏uarray是位於共享內存中的數組。

但是當我使用CUDA-gdb來看看__global__內核中的uarray的編輯部地址我得到:

(@generic unsigned char * @shared) 0x1000010 "z\377*" 

而且__device__內核中,我得到:

(unsigned char * @generic) 0x1000010 <Error reading address 0x1000010: Operation not permitted> 

儘管錯誤,程序在運行正常(也許這是cuda-gdb的一些限制)。

所以,我想知道:在__device__內核中,uarray被共享了嗎?我將數組從全局變爲共享內存,時間幾乎相同(共享內存的時間稍差)。

回答

4

所以,我想知道:在__device__內核中,uarray被共享了嗎?

是的,當您通過這種方式將共享內存指針傳遞給設備函數時,它仍指向共享內存中的相同位置。

在回答這些問題下面貼被困擾着我,我當選爲展示一個簡單的例子:

$ cat t249.cu 
#include <stdio.h> 

#define SSIZE 256 

__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ 
    return array_sh + params; 
} 

__global__ void mykernel(){ 
    __shared__ unsigned char myshared[SSIZE]; 
    __shared__ unsigned char *u_array; 
    for (int i = 0; i< SSIZE; i++) 
    myshared[i] = (unsigned char) i; 
    unsigned char *loc = dev_kernel(myshared, 5); 
    u_array = loc; 
    printf("val = %d\n", *loc); 
    printf("val = %d\n", *u_array); 
} 

int main(){ 

    mykernel<<<1,1>>>(); 
    cudaDeviceSynchronize(); 
    return 0; 
} 
$ nvcc -arch=sm_20 -g -G -o t249 t249.cu 
$ cuda-gdb ./t249 
NVIDIA (R) CUDA Debugger 
5.5 release 
.... 
Reading symbols from /home/user2/misc/t249...done. 
(cuda-gdb) break mykernel 
Breakpoint 1 at 0x4025dc: file t249.cu, line 9. 
(cuda-gdb) run 
Starting program: /home/user2/misc/t249 
[Thread debugging using libthread_db enabled] 

Breakpoint 1, mykernel() at t249.cu:9 
9  __global__ void mykernel(){ 
(cuda-gdb) break 14 
Breakpoint 2 at 0x4025e1: file t249.cu, line 14. 
(cuda-gdb) continue 
Continuing. 
[New Thread 0x7ffff725a700 (LWP 26184)] 
[Context Create of context 0x67e360 on Device 0] 
[Launch of CUDA Kernel 0 (mykernel<<<(1,1,1),(1,1,1)>>>) on Device 0] 
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 0, lane 0] 

Breakpoint 1, mykernel<<<(1,1,1),(1,1,1)>>>() at t249.cu:12 
12  for (int i = 0; i< SSIZE; i++) 
(cuda-gdb) continue 
Continuing. 

Breakpoint 2, mykernel<<<(1,1,1),(1,1,1)>>>() at t249.cu:14 
14  unsigned char *loc = dev_kernel(myshared, 5); 
(cuda-gdb) print &(myshared[0]) 
$1 = (@shared unsigned char *) 0x8 "" 
    ^
     | 
    cuda-gdb is telling you that this pointer is defined in a __shared__ statement, and therefore it's storage is implicit and it is unmodifiable. 

(cuda-gdb) print &(u_array) 
$2 = (@generic unsigned char * @shared *) 0x0 
    ^      ^
     |       u_array is stored in shared memory. 
     u_array is a generic pointer, meaning it can point to anything. 

(cuda-gdb) step 
dev_kernel(unsigned char * @generic, int) (array_sh=0x1000008 "", params=5) 
    at t249.cu:6 
6   return array_sh + params; 
(cuda-gdb) print array_sh 
$3 = (@generic unsigned char * @register) 0x1000008 "" 
     ^     ^
      |     array_sh is stored in a register. 
     array_sh is a generic pointer, it can point to anything. 

(cuda-gdb) print u_array 
No symbol "u_array" in current context. 
(note that I can't access u_array from inside the __device__ function, so I don't understand your comment there.) 

(cuda-gdb) step 
mykernel<<<(1,1,1),(1,1,1)>>>() at t249.cu:15 
15  u_array = loc; 
(cuda-gdb) step 
16  printf("val = %d\n", *loc); 
(cuda-gdb) print u_array 
$4 = (
    @generic unsigned char * @shared) 0x100000d ...... 
    ^      ^
     |      u_array is stored in shared memory 
    u_array is a generic pointer, it can point to anything 
(cuda-gdb) 

雖然你沒有提供它,我假設的u_array你的定義是類似地雷,基於你得到的cuda-gdb輸出。

注意,像@shared各項指標均不會告訴你什麼樣的內存的指針指向,他們告訴你無論什麼樣的指針是(在__shared__聲明隱含地定義),或者它在哪裏存儲(在共享內存中)。

如果這樣不能解決您的問題,請提供完整的示例,以及完整的cuda-gdb會話輸出,就像我一樣。

+0

謝謝,但爲什麼cuda-gdb不像'__global__'內核那樣說它是@shared? – Blufter

+0

您是使用獨立的cuda-gdb還是使用nsight eclipse版本?你正在使用哪種版本的CUDA? –

+0

我正在使用獨立的cuda-gdb,CUDA的版本是5.5。 – Blufter