計算2個矩陣之間的歐幾里德距離在CUDA

我正在寫在CUDA的程序，並且該問題如下：計算2個矩陣之間的歐幾里德距離在CUDA

兩個矩陣A（N * 128）和B（M * 128）
我拿A的第一行，然後我逐個計算該向量和B的所有行之間的距離。
我寫上的矩陣C的一個行的每個距離的結果，所以該元件C（I，J）C的包含行i A和B.

的行j之間的距離

- 和我着手A.

我實現這種方式的下一行：我得由（N * M）塊，每塊128個線程做了一個網格。（1 * 128）。

該程序正在編譯，但問題是它沒有提供良好的距離。我想不通自己做錯了什麼......

PS：我有CUDA 6.0了NVIDIA GTX 650（copute能力3.0）

__global__ void EuclidianDistances(float *A, float *B , float *C , int n , int m) 
{ 
    // SIZE is equal to 128 
__shared__ float accumResult[SIZE]; 
__shared__ float sA[SIZE]; 
__shared__ float sB[SIZE]; 

    // MAPPING 
int bx = blockIdx.x; // n 
int by = blockIdx.y; // m 
int ty = threadIdx.y; // 128 
int tx = threadIdx.x; // 1 


sA[ty] = A [bx * SIZE + ty]; 
sB[ty] = B [by * SIZE + ty]; 
__syncthreads(); 


accumResult[ty] = (sA[ty] - sB[ty])*(sA[ty] - sB[ty]); 
__syncthreads(); 


// Parallel tree-reduction 
for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1) 
    if (ty < stride) 
    { 
     accumResult[ty] += accumResult [stride + ty]; 
      __syncthreads(); 
    } 

    // Writing results to output matrix 
if ((threadIdx.y == 0)) 
    C [bx * m + by] = accumResult[ty]; 
     __syncthreads(); 
}

來源

2014-06-05 Madhatter

'（ty Levans

另外：條件似乎不對：'for（int stride = SIZE/2; stride <0; stride >> = 1）' –

@Levans：對不起，'pas'是'stride'。我只是糾正它。 – Madhatter

條件看起來錯誤：

for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1)

假設SIZE是128，如你所說，這將不會被執行。另外，如果語句內__synchthread可能停滯整個事情

編輯：閱讀OP的意見後，我意識到這是一個語言的問題..這裏是一個片段：

#include <iostream> 
using namespace std; 

int main() { 

    int SIZE = 128; 

    for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1) 
     cout << "Hello I'm running" << endl; 



    return 0; 
}

http://ideone.com/AyhXYF

輸出結果是：沒有。看看C++中的for loop syntax，第二部分是應持續循環的整個持續時間的條件。如果你以假條件開始，你的循環永遠不會被執行。

來源

2014-06-05 15:28:43

我剛將'pas'改成'stride'。抱歉，是我的錯。那爲什麼這個條件錯了？ – Madhatter

@ Jeb11他的回答仍然有效，你是不是指'步幅> 0'而不是'步幅<0'？ – Levans

@Levans現在正在用'stride> 0'工作，謝謝。我想我要刪除這個帖子，解決方案是非常明顯的.. – Madhatter

計算2個矩陣之間的歐幾里德距離在CUDA

回答

相關問題