重載CUDA shuffle函數使得原始函數不可見

我試圖在CUDA中實現我自己的64位shuffle函數。但是，如果我不喜歡這樣寫道：重載CUDA shuffle函數使得原始函數不可見

static __inline__ __device__ double __shfl_xor(double var, int laneMask, int width=warpSize) 
{ 
    int hi, lo; 
    asm volatile("mov.b64 { %0, %1 }, %2;" : "=r"(lo), "=r"(hi) : "d"(var)); 
    hi = __shfl_xor(hi, laneMask, width); 
    lo = __shfl_xor(lo, laneMask, width); 
    return __hiloint2double(hi, lo); 
}

所有後續調用__shfl_xor將從這64位版本的實例，不管參數的類型是什麼。舉例來說，如果我做

int a; 
a = __shfl_xor(a, 16);

它仍然會使用雙版本。解決方法可能使用不同的函數名稱。但是因爲我從一個模板函數調用了這個shuffle函數，所以使用不同的名稱意味着我必須爲64位浮點創建不同的版本，這並不完美。

那麼我怎麼能重載__shfl_xor（double，...）函數，同時仍然確保__shfl_xor（int，...）可以被適當調用？

來源

2013-04-11 Rainn

@RoBiK感謝您的評論... – Rainn 2013-04-11 21:49:48

所有積分類型和浮點數都可以翻倍。當在內置函數和專門的雙重函數之間進行選擇時，編譯器可能會爲所有類型選擇你的函數。

您是否嘗試過使用不同的名稱創建函數，並使用該函數爲其他類型創建了專門的雙變體和虛擬變體？

例如：

static __inline__ __device__ double foo_shfl_xor(double var, int laneMask, int width=warpSize) 
{ 
    // Your double shuffle implementation 
} 

static __inline__ __device__ int foo_shfl_xor(int var, int laneMask, int width=warpSize) 
{ 
    // For every non-double data type you use 
    // Just call the original shuffle function 
    return __shfl_xor(var, laneMask, width); 
} 

// Your code that uses shuffle 
double d; 
int a; 
foo_shfl_xor(d, ...); // Calls your custom shuffle 
foo_shfl_xor(a, ...); // Calls default shuffle

來源

2013-04-12 06:31:19

它的工作原理。非常感謝。但我仍然想知道是否有更簡單的解決方案。 – Rainn 2013-04-12 15:16:35

重載CUDA shuffle函數使得原始函數不可見

回答

相關問題