將二維數組發送到Cuda內核

我在理解如何將二維數組發送到Cuda時遇到了一些問題。我有一個解析每行有30個數據點的大文件的程序。我一次讀大約10行，然後爲每行和項目創建一個矩陣（所以在我的示例中，包含30個數據點的10行，它將是int list[10][30];我的目標是將此數組發送到我的內核並使每個塊進程（我已經得到了這個在普通的C中完美的工作，但CUDA有點更具挑戰性）。將二維數組發送到Cuda內核

這是我到目前爲止，但沒有運氣（注意：sizeofbucket =行，和sizeOfBucketsHoldings =項目行......我知道我應該贏得多變量名的獎勵）：

int list[sizeOfBuckets][sizeOfBucketsHoldings]; //this is created at the start of the file and I can confirmed its filled with the correct data 
#define sizeOfBuckets 10 //size of buckets before sending to process list 
#define sizeOfBucketsHoldings 30 
    //Cuda part 
       //define device variables 
       int *dev_current_list[sizeOfBuckets][sizeOfBucketsHoldings]; 
       //time to malloc the 2D array on device 
       size_t pitch; 
       cudaMallocPitch((int**)&dev_current_list, (size_t *)&pitch, sizeOfBucketsHoldings * sizeof(int), sizeOfBuckets); 

       //copy data from host to device 
       cudaMemcpy2D(dev_current_list, pitch, list, sizeOfBuckets * sizeof(int), sizeOfBuckets * sizeof(int), sizeOfBucketsHoldings * sizeof(int),cudaMemcpyHostToDevice); 

       process_list<<<count,1>>> (sizeOfBuckets, sizeOfBucketsHoldings, dev_current_list, pitch); 
       //free memory of device 
       cudaFree(dev_current_list); 


    __global__ void process_list(int sizeOfBuckets, int sizeOfBucketsHoldings, int *current_list, int pitch) { 
     int tid = blockIdx.x; 
     for (int r = 0; r < sizeOfBuckets; ++r) { 
      int* row = (int*)((char*)current_list + r * pitch); 
      for (int c = 0; c < sizeOfBucketsHoldings; ++c) { 
       int element = row[c]; 
      } 
     }

我得到的錯誤是：

main.cu(266): error: argument of type "int *(*)[30]" is incompatible with parameter of type "int *" 
1 error detected in the compilation of "/tmp/tmpxft_00003f32_00000000-4_main.cpp1.ii".

第266行是內核調用process_list<<<count,1>>> (count, countListItem, dev_current_list, pitch);我認爲問題是我想創建我的函數作爲int *我的數組，但我怎麼能創建它？在我純粹的C代碼中，我使用了int current_list[num_of_rows][num_items_in_row]，但我無法在Cuda中獲得相同的結果。

我的最終目標很簡單我只想讓每個塊處理每一行（sizeOfBuckets），然後通過該行（sizeOfBucketHoldings）中的所有項目循環。我原來只是做了一個正常的cudamalloc和cudaMemcpy，但它沒有工作，所以我四處張望，發現了MallocPitch和2dcopy（兩者都不在我的cuda by example書中），我一直在試圖研究示例，但它們似乎是給我同樣的錯誤（我目前正在閱讀第22頁的CUDA_C編程指南找到了這個想法，但仍然沒有運氣）。有任何想法嗎？或在哪裏尋找建議？

編輯：爲了測試這個，我只想把每一行的值加在一起（我通過例子數組添加例子複製了來自cuda的邏輯）。我的內核：

__global__ void process_list(int sizeOfBuckets, int sizeOfBucketsHoldings, int *current_list, size_t pitch, int *total) { 
    //TODO: we need to flip the list as well 
    int tid = blockIdx.x; 
    for (int c = 0; c < sizeOfBucketsHoldings; ++c) { 
     total[tid] = total + current_list[tid][c]; 
    } 
}

以下是我聲明數組總在我的主：

int *dev_total; 
cudaMalloc((void**)&dev_total, sizeOfBuckets * sizeof(int));

來源

2012-06-22 Lostsoul

你在你的代碼的一些錯誤。

然後，您將主機陣列複製到設備，您應該傳遞一維主機指針。請參閱function signature。
您不需要爲設備內存分配靜態二維數組。它在主機內存中創建靜態數組，然後將其重新創建爲設備陣列。請記住，它也必須是一維數組。看到這個function signature。

這個例子應該可以幫助您與內存分配：

__global__ void process_list(int sizeOfBucketsHoldings, int* total, int* current_list, int pitch) 
{ 
    int tid = blockIdx.x; 
    total[tid] = 0; 
    for (int c = 0; c < sizeOfBucketsHoldings; ++c) 
    { 
     total[tid] += *((int*)((char*)current_list + tid * pitch) + c); 
    } 
} 

int main() 
{ 
    size_t sizeOfBuckets   = 10; 
    size_t sizeOfBucketsHoldings = 30; 

    size_t width = sizeOfBucketsHoldings * sizeof(int);//ned to be in bytes 
    size_t height = sizeOfBuckets; 

    int* list = new int [sizeOfBuckets * sizeOfBucketsHoldings];// one dimensional 
    for (int i = 0; i < sizeOfBuckets; i++) 
     for (int j = 0; j < sizeOfBucketsHoldings; j++) 
      list[i *sizeOfBucketsHoldings + j] = i; 

    size_t pitch_h = sizeOfBucketsHoldings * sizeof(int);// always in bytes 

    int* dev_current_list; 
    size_t pitch_d; 
    cudaMallocPitch((int**)&dev_current_list, &pitch_d, width, height); 

    int *test; 
    cudaMalloc((void**)&test, sizeOfBuckets * sizeof(int)); 
    int* h_test = new int[sizeOfBuckets]; 

    cudaMemcpy2D(dev_current_list, pitch_d, list, pitch_h, width, height, cudaMemcpyHostToDevice); 

    process_list<<<10, 1>>>(sizeOfBucketsHoldings, test, dev_current_list, pitch_d); 
    cudaDeviceSynchronize(); 

    cudaMemcpy(h_test, test, sizeOfBuckets * sizeof(int), cudaMemcpyDeviceToHost); 

    for (int i = 0; i < sizeOfBuckets; i++) 
     printf("%d %d\n", i , h_test[i]); 
    return 0; 
}

要訪問內核的二維數組，你應該使用模式base_addr + y * pitch_d + x。警告：pitvh總是以字節爲單位。您需要將您的指針投射到byte*。

來源

2012-06-22 04:52:17 geek

謝謝你一如既往的瑪麗娜。我嘗試了你的設置，但是當我嘗試啓動內核時，我仍然得到相同的錯誤'錯誤：類型爲「int *（*）[sizeOfBucketsHoldings]」的參數與類型爲「int *」的參數不兼容發送數組是否正確？ – Lostsoul

對不起，我想我明白你現在在做什麼..我將主機更改爲列表，但在編譯時沒有得到錯誤，但得到了'Segmentation fault：11'，但它可能與我的測試內核有關。 – Lostsoul

請更新有問題的代碼以獲取當前信息問題在於。 – geek

將二維數組發送到Cuda內核

回答

相關問題