2013-03-14 34 views
3

我在寫一些C++ AMP代碼時遇到了問題。我已經包含了一個樣本。 它在模擬加速器上運行良好,但在我的硬件(Windows 7,NVIDIA GeForce GTX 660,最新驅動程序)上崩潰了顯示驅動程序,但是我的代碼沒有看到任何問題。C++ AMP硬件崩潰(GeForce GTX 660)

我的代碼有問題,還是硬件/驅動程序/編譯器問題?

#include "stdafx.h" 

#include <vector> 
#include <iostream> 
#include <amp.h> 

int _tmain(int argc, _TCHAR* argv[]) 
{ 
    // Prints "NVIDIA GeForce GTX 660" 
    concurrency::accelerator_view target_view = concurrency::accelerator().create_view(); 
    std::wcout << target_view.accelerator.description << std::endl; 

    // lower numbers do not cause the issue 
    const int x = 2000; 
    const int y = 30000; 

    // 1d array for storing result 
    std::vector<unsigned int> resultVector(y); 
    Concurrency::array_view<unsigned int, 1> resultsArrayView(resultVector.size(), resultVector); 

    // 2d array for data for processing 
    std::vector<unsigned int> dataVector(x * y); 
    concurrency::array_view<unsigned int, 2> dataArrayView(y, x, dataVector); 
    parallel_for_each(
     // Define the compute domain, which is the set of threads that are created. 
     resultsArrayView.extent, 
     // Define the code to run on each thread on the accelerator. 
     [=](concurrency::index<1> idx) restrict(amp) 
    { 
     concurrency::array_view<unsigned int, 1> buffer = dataArrayView[idx[0]]; 
     unsigned int bufferSize = buffer.get_extent().size(); 

     // needs both loops to cause crash 
     for (unsigned int outer = 0; outer < bufferSize; outer++) 
     { 
      for (unsigned int i = 0; i < bufferSize; i++) 
      { 
       // works without this line, also if I change to buffer[0] it works? 
       dataArrayView[idx[0]][0] = 0; 
      } 
     } 
     // works without this line 
     resultsArrayView[0] = 0; 
    }); 

    std::cout << "chash on next line" << std::endl; 
    resultsArrayView.synchronize(); 
    std::cout << "will never reach me" << std::endl; 

    system("PAUSE"); 
    return 0; 
} 

回答

7

這很可能是您的計算超過了允許的量子時間(默認2秒)。在此之後,操作系統進入並重新啓動GPU,這被稱爲Timeout Detection and Recovery (TDR)。軟件適配器(參考設備)沒有啓用TDR,這就是計算可能超過允許的量子時間的原因。

您的計算是否真的需要3000個線程(變量x),每個線程執行2000 * 3000(x * y)循環迭代?你可以將你的計算分塊,這樣每個塊的計算時間少於2秒。您還可以考慮禁用TDR或超出允許的量子時間以適應您的需求。

我強烈建議你閱讀如何在C++ AMP,這說明在細節TDR處理存託憑證博客文章:http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/07/handling-tdrs-in-c-amp.aspx

此外,這裏是如何禁用在Windows 8的TDR獨立的博客文章: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/disabling-tdr-on-windows-8-for-your-c-amp-algorithms.aspx

+0

非常感謝你,我開始因此而失去理智。我從來不知道這個TDR存在。我已經更新了它,現在它可以工作。謝謝你的驚人答案! – 2013-03-15 19:17:11