多GPU與CUDA Thrust的使用

我想用我的兩個圖形卡進行CUDA Thrust計算。多GPU與CUDA Thrust的使用

我有兩個圖形卡。即使我在std :: vector中存儲了兩個device_vector，運行在單張卡片上的效果也非常好。

如果我同時使用兩張卡片，循環中的第一個循環將起作用並且不會導致錯誤。第一次運行後，它會導致錯誤，可能是因爲設備指針無效。

我不確定確切的問題是什麼，或者如何使用兩張卡進行計算。

最少的代碼示例：在執行

std::vector<thrust::device_vector<float> > TEST() { 
    std::vector<thrust::device_vector<float> > vRes; 

    unsigned int iDeviceCount = GetCudaDeviceCount(); 
    for(unsigned int i = 0; i < iDeviceCount; i++) { 
     checkCudaErrors(cudaSetDevice(i)); 
     thrust::host_vector<float> hvConscience(1024); 

       // first run works, runs afterwards cause errors .. 
     vRes.push_back(hvConscience); // this push_back causes the error on exec 

    } 
    return vRes; 
}

錯誤消息：

terminate called after throwing an instance of 'thrust::system::system_error' 
what(): invalid argument

來源

2013-06-02 dgrat

您是使用主機副本還是使用主機緩衝區？ –

不知道你的意思。此代碼從主機複製到設備。 – dgrat

所以他們不在SLI模式？ –

這裏的問題是，你正在試圖一對device_vector之間執行設備來複制數據的裝置，該裝置駐留在不同的GPU環境中（因爲撥打cudaSetDevice）。你也許已經忽視的是，這個順序操作：

thrust::host_vector<float> hvConscience(1024); 
vRes.push_back(hvConscience);

從hvConscience在每次循環迭代執行副本。推力後端期望源和目標內存位於相同的GPU環境中。在這種情況下，他們不會，因此錯誤。

你可能想要做的是與指針到device_vector矢量工作，而不是，所以像：

typedef thrust::device_vector<float> vec; 
typedef vec *p_vec; 
std::vector<p_vec> vRes; 

unsigned int iDeviceCount = GetCudaDeviceCount(); 
for(unsigned int i = 0; i < iDeviceCount; i++) { 
    cudaSetDevice(i); 
    p_vec hvConscience = new vec(1024); 
    vRes.push_back(hvConscience); 
}

[免責聲明：寫在瀏覽器的代碼，無論是在自己編譯，也不測試，我們風險]

這樣，您只能在正確的GPU上下文中創建每個向量一次，然後複製分配主機指針，該主機指針不會在內存空間中觸發任何設備端副本。

來源

2013-06-03 11:49:12 talonmies

多GPU與CUDA Thrust的使用

回答

相關問題