2012-06-22 58 views
5

我在理解如何將二維數組發送到Cuda時遇到了一些問題。我有一個解析每行有30個數據點的大文件的程序。我一次讀大約10行,然後爲每行和項目創建一個矩陣(所以在我的示例中,包含30個數據點的10行,它將是int list[10][30];我的目標是將此數組發送到我的內核並使每個塊進程(我已經得到了這個在普通的C中完美的工作,但CUDA有點更具挑戰性)。將二維數組發送到Cuda內核

這是我到目前爲止,但沒有運氣(注意:sizeofbucket =行,和sizeOfBucketsHoldings =項目行......我知道我應該贏得多變量名的獎勵):

int list[sizeOfBuckets][sizeOfBucketsHoldings]; //this is created at the start of the file and I can confirmed its filled with the correct data 
#define sizeOfBuckets 10 //size of buckets before sending to process list 
#define sizeOfBucketsHoldings 30 
    //Cuda part 
       //define device variables 
       int *dev_current_list[sizeOfBuckets][sizeOfBucketsHoldings]; 
       //time to malloc the 2D array on device 
       size_t pitch; 
       cudaMallocPitch((int**)&dev_current_list, (size_t *)&pitch, sizeOfBucketsHoldings * sizeof(int), sizeOfBuckets); 

       //copy data from host to device 
       cudaMemcpy2D(dev_current_list, pitch, list, sizeOfBuckets * sizeof(int), sizeOfBuckets * sizeof(int), sizeOfBucketsHoldings * sizeof(int),cudaMemcpyHostToDevice); 

       process_list<<<count,1>>> (sizeOfBuckets, sizeOfBucketsHoldings, dev_current_list, pitch); 
       //free memory of device 
       cudaFree(dev_current_list); 


    __global__ void process_list(int sizeOfBuckets, int sizeOfBucketsHoldings, int *current_list, int pitch) { 
     int tid = blockIdx.x; 
     for (int r = 0; r < sizeOfBuckets; ++r) { 
      int* row = (int*)((char*)current_list + r * pitch); 
      for (int c = 0; c < sizeOfBucketsHoldings; ++c) { 
       int element = row[c]; 
      } 
     } 

我得到的錯誤是:

main.cu(266): error: argument of type "int *(*)[30]" is incompatible with parameter of type "int *" 
1 error detected in the compilation of "/tmp/tmpxft_00003f32_00000000-4_main.cpp1.ii". 

第266行是內核調用process_list<<<count,1>>> (count, countListItem, dev_current_list, pitch);我認爲問題是我想創建我的函數作爲int *我的數組,但我怎麼能創建它?在我純粹的C代碼中,我使用了int current_list[num_of_rows][num_items_in_row],但我無法在Cuda中獲得相同的結果。

我的最終目標很簡單我只想讓每個塊處理每一行(sizeOfBuckets),然後通過該行(sizeOfBucketHoldings)中的所有項目循環。我原來只是做了一個正常的cudamalloc和cudaMemcpy,但它沒有工作,所以我四處張望,發現了MallocPitch和2dcopy(兩者都不在我的cuda by example書中),我一直在試圖研究示例,但它們似乎是給我同樣的錯誤(我目前正在閱讀第22頁的CUDA_C編程指南找到了這個想法,但仍然沒有運氣)。有任何想法嗎?或在哪裏尋找建議?

編輯: 爲了測試這個,我只想把每一行的值加在一起(我通過例子數組添加例子複製了來自cuda的邏輯)。 我的內核:

__global__ void process_list(int sizeOfBuckets, int sizeOfBucketsHoldings, int *current_list, size_t pitch, int *total) { 
    //TODO: we need to flip the list as well 
    int tid = blockIdx.x; 
    for (int c = 0; c < sizeOfBucketsHoldings; ++c) { 
     total[tid] = total + current_list[tid][c]; 
    } 
} 

以下是我聲明數組總在我的主:

int *dev_total; 
cudaMalloc((void**)&dev_total, sizeOfBuckets * sizeof(int)); 

回答

3

你在你的代碼的一些錯誤。

  • 然後,您將主機陣列複製到設備,您應該傳遞一維主機指針。請參閱function signature
  • 您不需要爲設備內存分配靜態二維數組。它在主機內存中創建靜態數組,然後將其重新創建爲設備陣列。請記住,它也必須是一維數組。看到這個function signature

這個例子應該可以幫助您與內存分配:

__global__ void process_list(int sizeOfBucketsHoldings, int* total, int* current_list, int pitch) 
{ 
    int tid = blockIdx.x; 
    total[tid] = 0; 
    for (int c = 0; c < sizeOfBucketsHoldings; ++c) 
    { 
     total[tid] += *((int*)((char*)current_list + tid * pitch) + c); 
    } 
} 

int main() 
{ 
    size_t sizeOfBuckets   = 10; 
    size_t sizeOfBucketsHoldings = 30; 

    size_t width = sizeOfBucketsHoldings * sizeof(int);//ned to be in bytes 
    size_t height = sizeOfBuckets; 

    int* list = new int [sizeOfBuckets * sizeOfBucketsHoldings];// one dimensional 
    for (int i = 0; i < sizeOfBuckets; i++) 
     for (int j = 0; j < sizeOfBucketsHoldings; j++) 
      list[i *sizeOfBucketsHoldings + j] = i; 

    size_t pitch_h = sizeOfBucketsHoldings * sizeof(int);// always in bytes 

    int* dev_current_list; 
    size_t pitch_d; 
    cudaMallocPitch((int**)&dev_current_list, &pitch_d, width, height); 

    int *test; 
    cudaMalloc((void**)&test, sizeOfBuckets * sizeof(int)); 
    int* h_test = new int[sizeOfBuckets]; 

    cudaMemcpy2D(dev_current_list, pitch_d, list, pitch_h, width, height, cudaMemcpyHostToDevice); 

    process_list<<<10, 1>>>(sizeOfBucketsHoldings, test, dev_current_list, pitch_d); 
    cudaDeviceSynchronize(); 

    cudaMemcpy(h_test, test, sizeOfBuckets * sizeof(int), cudaMemcpyDeviceToHost); 

    for (int i = 0; i < sizeOfBuckets; i++) 
     printf("%d %d\n", i , h_test[i]); 
    return 0; 
} 

要訪問內核的二維數組,你應該使用模式base_addr + y * pitch_d + x警告:pitvh總是以字節爲單位。您需要將您的指針投射到byte*

+0

謝謝你一如既往的瑪麗娜。我嘗試了你的設置,但是當我嘗試啓動內核時,我仍然得到相同的錯誤'錯誤:類型爲「int *(*)[sizeOfBucketsHoldings]」的參數與類型爲「int *」的參數不兼容發送數組是否正確? – Lostsoul

+0

對不起,我想我明白你現在在做什麼..我將主機更改爲列表,但在編譯時沒有得到錯誤,但得到了'Segmentation fault:11',但它可能與我的測試內核有關。 – Lostsoul

+0

請更新有問題的代碼以獲取當前信息問題在於。 – geek