我正在嘗試在GPU(使用CUDA)上實現矩陣向量乘法。在我的C++代碼(CPU)中,我將矩陣加載爲一個稠密矩陣,然後使用CUDA執行矩陣向量乘法。我也使用共享內存來提高性能。CUDA中的稀疏矩陣向量乘法
- 如何知道我的矩陣是一個稀疏矩陣以有效的方式加載矩陣?
下面是我的C++函數加載矩陣:
int readMatrix(char* filename, float* &matrix, unsigned int *dim = NULL, int majority = ROW_MAJOR)
{
unsigned int w, h, x, y, num_entries;
float val;
std::ifstream file(filename);
if (file)
{
file >> h >> w >> num_entries;
cout << w << " " << h << " " << num_entries << "\n";
assert(w == h || w == 1 || h == 1);
if(dim != NULL) *dim = std::max(w, h);
matrix = new float[ w * h ];
unsigned int i;
for(i = 0; i < num_entries; i++){
if(file.eof()) break;
file >> y >> x >> val;
if(majority == ROW_MAJOR){
matrix[ w * y + x ] = val;
} else if(majority == COLUMN_MAJOR){
matrix[ h * x + y ] = val;
}
}
file.close();
if(i == num_entries)
std::cout << "\nFile read successfully\n";
else
std::cout << "\nFile read successfully but seems defective:\n num entries read = " << i << ", entries epected = " << num_entries << "\n";
// print first few elements
if(w == h){
for(unsigned int i = 0; i < w; i++){
printf("\n");
for(unsigned int j = 0; j < h; j++){
printf("%.2f ", matrix[ j + w * i ]);
}
}
}
else{
printf("\n");
for(unsigned int j = 0; j < h; j++){
printf("%.2f ", matrix[ j ]);
}
}
} else {
std::cout << "Unable to open file\n";
return false;
}
return true;
}
下面是我的CUDA內核函數處理該矩陣 - 向量乘法:
__global__ void
_cl_matrix_vector_(float *A, float *b, float *x, int dim)
{
extern __shared__ float vec[];
unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
float temp = 0.0;
int vOffs = 0;
//load vector into shared memory
for (int i = 0; i < (dim/blockDim.x) + 1 ; ++i, vOffs+= blockDim.x) {
vec[vOffs + threadIdx.x] = b[vOffs + threadIdx.x];
}
//make sure all threads are synchronized
__syncthreads();
if (idx < dim) {
temp = 0.0;
//dot product (multiplication)
for (int i = 0; i < dim; i++){
temp += A[idx * dim + i] * vec[i];
}
x[idx] = temp;
}
}
- 我必須對我的CUDA代碼進行必要的修改,以考慮到我的矩陣是一個稀疏矩陣?
- 我從論壇中發現我們也可以使用填充來優化性能,但這需要我改變讀取矩陣/排序矩陣的方式。任何想法如何實現這種填充的方式我讀矩陣和執行計算?
正確答案完全取決於稀疏矩陣的存儲格式。請參閱http://www.nvidia.com/object/nvidia_research_pub_001.html,其中討論了GPU上不同稀疏格式的優點。 – talonmies 2011-05-12 09:38:59