我寫了一個小型cuda代碼來理解全局內存共享內存傳輸事務。代碼如下:從設備存儲器到共享存儲器需要CUDA設備內存交易
#include <iostream>
using namespace std;
__global__ void readUChar4(uchar4* c, uchar4* o){
extern __shared__ uchar4 gc[];
int tid = threadIdx.x;
gc[tid] = c[tid];
o[tid] = gc[tid];
}
int main(){
string a = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
uchar4* c;
cudaError_t e1 = cudaMalloc((void**)&c, 128*sizeof(uchar4));
if(e1==cudaSuccess){
uchar4* o;
cudaError_t e11 = cudaMalloc((void**)&o, 128*sizeof(uchar4));
if(e11 == cudaSuccess){
cudaError_t e2 = cudaMemcpy(c, a.c_str(), 128*sizeof(uchar4), cudaMemcpyHostToDevice);
if(e2 == cudaSuccess){
readUChar4<<<1,128, 128*sizeof(uchar4)>>>(c, o);
uchar4* oFromGPU = (uchar4*)malloc(128*sizeof(uchar4));
cudaError_t e22 = cudaMemcpy(oFromGPU, o, 128*sizeof(uchar4), cudaMemcpyDeviceToHost);
if(e22 == cudaSuccess){
for(int i =0; i < 128; i++){
cout << oFromGPU[i].x << " ";
cout << oFromGPU[i].y << " ";
cout << oFromGPU[i].z << " ";
cout << oFromGPU[i].w << " " << endl;
}
}
else{
cout << "Failed to copy from GPU" << endl;
}
}
else{
cout << "Failed to copy" << endl;
}
}
else{
cout << "Failed to allocate output memory" << endl;
}
}
else{
cout << "Failed to allocate memory" << endl;
}
return 0;
}
此代碼簡單地複製數據,並返回到設備存儲器。我有以下三個問題:
- 在這種情況下,從設備內存傳輸到共享內存是否保證了4次內存事務?我相信這取決於cudaMalloc如何分配內存;如果內存是以隨機方式分配的,這樣數據分散在內存中,那麼它將花費超過4次內存交易。但是,如果cudaMalloc以128字節塊的形式分配內存,或者連續分配內存,則不應超過4個內存事務。
- 上述邏輯是否也適用於將數據從共享內存寫入設備內存,即傳輸將在4個內存事務中完成。
- 該代碼是否會導致銀行衝突。我相信,如果線程按順序分配了ID,那麼這段代碼不會導致銀行衝突。但是,如果線程32和64計劃在同一個warp中運行,那麼此代碼可能會導致bank衝突。
寫這樣的microbenchmarks時要小心;編譯器可能會執行諸如優化寫入共享內存的操作。 – ArchaeaSoftware 2012-07-11 16:29:37