推力1.7在CUDA設備上列表失敗

新的thrust :: tabulate功能適用於我在主機上，但不適用於設備。該設備是K20x，計算能力爲3.5。主機是一臺128GB內存的Ubuntu機器。幫幫我？推力1.7在CUDA設備上列表失敗

我認爲統一尋址不是問題，因爲我可以在設備上對統一尋址的陣列進行排序。

#include <iostream> 
#include <thrust/device_vector.h> 
#include <thrust/execution_policy.h> 
#include <thrust/tabulate.h> 
#include <thrust/version.h> 

using namespace std; 

// Print an expression's name then its value, possible followed by a 
// comma or endl. Ex: cout << PRINTC(x) << PRINTN(y); 
#define PRINT(arg) #arg "=" << (arg) 
#define PRINTC(arg) #arg "=" << (arg) << ", " 
#define PRINTN(arg) #arg "=" << (arg) << endl 

// Execute an expression and check for CUDA errors. 
#define CE(exp) {      \ 
cudaError_t e; e = (exp);      \ 
if (e != cudaSuccess) { \ 
    cerr << #exp << " failed at line " << __LINE__ << " with error " << cudaGetErrorString(e) << endl; \ 
    exit(1); \ 
} \ 
} 

const int N(10); 

int main(void) { 
    int major = THRUST_MAJOR_VERSION; 
    int minor = THRUST_MINOR_VERSION; 
    cout << "Thrust v" << major << "." << minor 
    << ", CUDA_VERSION: " << CUDA_VERSION << ", CUDA_ARCH: " << __CUDA_ARCH__ 
    << endl; 
    cout << PRINTN(N); 
    cudaDeviceProp prop; 
    cudaGetDeviceProperties(&prop, 0); 
    if (!prop.unifiedAddressing) { 
cerr << "Unified addressing not available." << endl; 
exit(1); 
    } 
    cudaGetDeviceProperties(&prop, 0); 
    if (!prop.canMapHostMemory) { 
cerr << "Can't map host memory." << endl; 
exit(1); 
    } 
    cudaSetDeviceFlags(cudaDeviceMapHost); 

    int *p, *q; 
    CE(cudaHostAlloc(&p, N*sizeof(int), cudaHostAllocMapped)); 
    CE(cudaHostAlloc(&q, N*sizeof(int), cudaHostAllocMapped)); 

    thrust::tabulate(thrust::host, p, p+N, thrust::negate<int>()); 
    thrust::tabulate(thrust::device, q, q+N, thrust::negate<int>()); 

    for (int i=0; i<N; i++) 
cout << PRINTC(i) << PRINTC(p[i]) << PRINTN(q[i]); 
}

輸出：

Thrust v1.7, CUDA_VERSION: 6000, CUDA_ARCH: 0 
N=10 
i=0, p[i]=0, q[i]=0 
i=1, p[i]=-1, q[i]=0 
i=2, p[i]=-2, q[i]=0 
i=3, p[i]=-3, q[i]=0 
i=4, p[i]=-4, q[i]=0 
i=5, p[i]=-5, q[i]=0 
i=6, p[i]=-6, q[i]=0 
i=7, p[i]=-7, q[i]=0 
i=8, p[i]=-8, q[i]=0 
i=9, p[i]=-9, q[i]=0

下不會將任何信息內容添加到我的職位，但之前計算器會接受它是必需的：大部分的程序是錯誤檢查和版本檢查。

來源

2014-03-06 WRF

這個問題似乎在目前的推力master branch中是固定的。這個主分支目前將自己標識爲Thrust v1.8。

我用CUDA 6RC運行你的代碼（看起來就是你正在使用的），我可以複製你的觀察結果。

然後，我更新到主分支，並從您的代碼中刪除__CUDA_ARCH__宏，我得到了預期的結果（主機和設備表匹配）。

注意的是，根據programming guide，所述__CUDA_ARCH__宏僅當在正由該裝置代碼編譯器編譯的代碼的二手定義。它在主機代碼中是官方未定義的。因此在主機代碼中使用它如下所示是可以接受的：

#ifdef __CUDA_ARCH__

但不像您使用的那樣。是的，我知道推力v1.7和推力大師在這方面的行爲是不同的，但這似乎（也）是推力問題，在主分支中已經確定。每當下一個版本的推力被納入正式的CUDA下降

的這些問題，我希望兩者都是固定的。由於我們非常接近CUDA 6.0正式版本，如果在CUDA 6.0中解決了這些問題，我會感到驚訝。

有關製表問題的進一步說明：

一個解決辦法是更新推力掌握
問題似乎並沒有具體到thrust::tabulate在我的測試。我測試過的許多推力函數似乎都失敗了，因爲在與thrust::device和原始指針一起使用時，它們無法正確寫入值（似乎寫入全零），但似乎能夠正確讀取值（例如，thrust::reduce似乎可以正常工作）
另一個可能的解決方法是使用thrust::device_ptr_cast<>()用thrust::device_ptr<>包裝原始指針。這似乎也適用於我。

來源

2014-03-06 19:25:41

推力1.7在CUDA設備上列表失敗

回答

相關問題