設備內存上的多個指針爲單個分配數組在cuda

我想知道是否有可能設置多個指針，以單個數據已分配在內存中？我問這個的原因是因爲我正在執行lexographical與GPU排序推力矢量的幫助（在時間上非常失敗）設備內存上的多個指針爲單個分配數組在cuda

例如我試圖acheive相當於這些的C++ statments

unsigned int * pword;  //setting up the array of memory for permutations of word 
pword = new unsigned int [N*N]; 

unsigned int* * p_pword; //pointers to permutation words 
p_pword = new unsigned int* [N]; 

//setting up the pointers on the locations such that if N=4 then 0,4,8,12,... 
int count; 
for(count=0;count<N;count++) 
     p_pword[count]=&pword[count*N];

我不是要求某人向我提供代碼，我只是想知道有沒有什麼方法可以設置指向單個數據數組的指針。 PS：我曾嘗試以下方法，但在所有

int * raw_ptr = thrust::raw_pointer_cast(&d_Data[0]); //doing same with multiple pointers

沒有實現任何加速，但我的事實猜測，由於我正在朝着device_vector指着它可能是緩慢的訪問

任何幫助的問題在這方面受到高度讚賞。

來源

2013-04-08 Asif Ali

嗯，這沒有任何意義：

int * raw_ptr = thrust::raw_pointer_cast([0]); 
             ^what is this??

我不認爲行將正確編譯。

但在推力你一定可以做這樣的事情：

#include <thrust/host_vector.h> 
#include <thrust/device_vector.h> 
#include <thrust/device_ptr.h> 
#include <thrust/sequence.h> 

int main(){ 

    int N=16; 
    thrust::device_vector<int> d_A(4*N); 
    thrust::sequence(d_A.begin(), d_A.end()); 
    thrust::device_ptr<int> p_A[N]; 
    for (int i=0; i<N; i++) 
    p_A[i] = &(d_A[4*i]); 
    thrust::host_vector<int> h_A(N); 
    thrust::copy(p_A[4], p_A[8], h_A.begin()); 
    for (int i=0; i<N; i++) 
    printf("h_A[%d] = %d\n", i, h_A[i]); 
    return 0; 
}

不知道要說什麼加速。在你發佈的一小段代碼中加快速度對我來說並不合適。

來源

2013-04-08 06:49:12

再次感謝羅伯特Crovella答覆其實我試圖做到這一點 INT * raw_ptr =推力:: raw_pointer_cast（d_Data [0]）; – 2013-04-08 19:13:19

好的答案的方式（但我已經這樣使用推力的device_ptrs），並對不起，如果我在提問混淆我想問有沒有什麼辦法可以使指針數組保存在CUDA中的單個數組的地址內存（假設無符號整型* d_Data）我已經實現了您在我的例子與上述相同的邏輯，但我一直在尋找多個指針單陣列（未device_vector） – 2013-04-08 19:20:12

這應該工作。但是它創建了一個在推力算法中不便於使用的指針。但是，您可以在普通CUDA代碼中使用該指針。 – 2013-04-08 19:20:43

設備內存上的多個指針爲單個分配數組在cuda

回答

相關問題