在caffe的__global__函數中使用CUDA數學函數

-2

由於某些原因，我必須更改caffe的源代碼。這是修改後的代碼。在caffe的__global__函數中使用CUDA數學函數

headfile

#include <algorithm> 
#include <cfloat> 
#include <vector> 
#include "caffe/layer.hpp" 
#include "caffe/util/math_functions.hpp" 
#include "caffe/vision_layers.hpp"

修改後的代碼

template <typename Dtype> 
__global__ void LPPoolForward(const int nthreads, 
const Dtype* const bottom_data, const int num, const int channels, 
const int height, const int width, const int pooled_height, 
const int pooled_width, const int kernel_h, const int kernel_w, 
const int stride_h, const int stride_w, const int pad_h, const int pad_w,float p, 
Dtype* const top_data) { 
CUDA_KERNEL_LOOP(index, nthreads) { 
const int pw = index % pooled_width; 
const int ph = (index/pooled_width) % pooled_height; 
const int c = (index/pooled_width/pooled_height) % channels; 
const int n = index/pooled_width/pooled_height/channels; 
int hstart = ph * stride_h - pad_h; 
int wstart = pw * stride_w - pad_w; 
int hend = min(hstart + kernel_h, height + pad_h); 
int wend = min(wstart + kernel_w, width + pad_w); 
hstart = max(hstart, 0); 
wstart = max(wstart, 0); 
hend = min(hend, height); 
wend = min(wend, width); 
Dtype lp = 0; 
double p1=p; 
const Dtype* const bottom_slice =bottom_data + (n * channels + c)*height* width; 
for (int h = hstart; h < hend; ++h) { 
    for (int w = wstart; w < wend; ++w) { 

    lp += pow(bottom_slice[h * width + w],p1); 
    lp += pow(bottom_slice[h * width + w],p); 

    } 
} 
    top_data[index] = pow(lp,1/p1); 
} 
} 
}

Using CUDA math functions in a __global__ function - Nsight Eclipse Edition 從這一點，我知道POW（）必須具有所有雙精度或全部單精度參數。的問題是，當我使用p1(double),lp += pow(bottom_slice[h * width + w],p1)，發生這種情況

calling a __host__ function("std::pow< float, double> ") from a __global__ function is not allowed

，當我使用p(float),lp += pow(bottom_slice[h * width + w],p)這事

error: calling a __host__ function("std::pow< double, float> ") from a __global__ function("caffe::LPPoolForward ") is not allowed

爲什麼當我改變了戰俘的第二個參數的精度，首先也發生了變化？我不是很熟悉caffe，所以有關如何解決這個問題的任何想法？

來源

2016-10-31 Wang TZ

您正在修改模板的內核？您沒有提供足夠的信息給其他人以回答您的問題 – talonmies

現在你已經顯示了一些更多的代碼，原因是顯而易見的。有問題的內核是一個模板，這意味着代碼可以被實例化爲這兩個單精度和雙精度類型。通過修復單精度代碼，可以將其分解爲雙精度。反之亦然。

解決方案是使中間變量聲明爲Dtype。然後，根據內核實例化的類型，參數將始終匹配，並且在編譯期間pow不會出現問題。

來源

2016-10-31 12:08:18 talonmies

要添加背景：發現錯誤的直接原因是CUDA [1]使用主機工具鏈的'math.h'，[2]提供實現設備代碼的'pow（float，float）'，'pow（double，double）'，'pow（float，int）'和'pow（double，int）'。因此，編譯器找不到設備代碼爲'pow（float，double）'和'pow（double，float）'的實現，但它顯然確實爲這些函數的主機端版本找到了原型。但是，主機功能無法從設備代碼中調用，導致問題中顯示錯誤。 – njuffa

它解決了這個問題，謝謝 –

在caffe的__global__函數中使用CUDA數學函數

回答

相關問題

在caffe的global函數中使用CUDA數學函數