clock_gettime（）的時序問題CUDA

我想編寫一個CUDA代碼，我可以親眼看到CUDA爲加速應用程序所提供的好處。clock_gettime（）的時序問題CUDA

這裏是是我使用推力編寫的CUDA代碼（http://code.google.com/p/thrust/）

簡言之，所有代碼的作用是創建兩個2^23長度整數載體，一個在主機上和一個設備相同的上彼此，並對它們進行分類。它也（嘗試）爲每個測量時間。

在主機載體上我使用std::sort。在設備矢量上我使用thrust::sort。

對於彙編我用

NVCC sortcompare.cu -lrt

的程序在終端的輸出是

桌面：./a.out

主機所需時間爲：19。 224622882秒

設備所用時間是：19。 321644143秒

桌面：爲表示

第一的std :: COUT語句19.224秒後產生。然而，第二個std :: cout語句（即使它說19.32秒）立即產生後第一個 std :: cout語句。請注意，我在clock_gettime使用不同time_stamps進行測量（）即ts_host和ts_device

我使用CUDA 4.0和NVIDIA GTX 570計算能力2.0

#include<iostream> 
    #include<vector> 
    #include<algorithm> 
    #include<stdlib.h> 

    //For timings 
    #include<time.h> 
    //Necessary thrust headers 
    #include<thrust/sort.h> 
    #include<thrust/host_vector.h> 
    #include<thrust/device_vector.h> 
    #include<thrust/copy.h> 


    int main(int argc, char *argv[]) 
    { 
     int N=23; 
     thrust::host_vector<int>H(1<<N);//create a vector of 2^N elements on host 
     thrust::device_vector<int>D(1<<N);//The same on the device. 
     thrust::host_vector<int>dummy(1<<N);//Copy the D to dummy from GPU after sorting 

     //Set the host_vector elements. 
     for (int i = 0; i < H.size(); ++i) { 
      H[i]=rand();//Set the host vector element to pseudo-random number. 
     } 

     //Sort the host_vector. Measure time 
     // Reset the clock 
     timespec ts_host; 
     ts_host.tv_sec = 0; 
     ts_host.tv_nsec = 0; 
     clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Start clock 

      thrust::sort(H.begin(),H.end()); 

     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Stop clock 
     std::cout << "\nHost Time taken is: " << ts_host.tv_sec<<" . "<< ts_host.tv_nsec <<" seconds" << std::endl; 


     D=H; //Set the device vector elements equal to the host_vector 
     //Sort the device vector. Measure time. 
     timespec ts_device; 
     ts_device.tv_sec = 0; 
      ts_device.tv_nsec = 0; 
     clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Start clock 

      thrust::sort(D.begin(),D.end()); 
      thrust::copy(D.begin(),D.end(),dummy.begin()); 


     clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Stop clock 
     std::cout << "\nDevice Time taken is: " << ts_device.tv_sec<<" . "<< ts_device.tv_nsec <<" seconds" << std::endl; 

     return 0; 
    }

來源

2011-11-11 smilingbuddha

你的問題不清楚。你曾經說過你的兩次相差0.1秒，而這種差別幾乎是人眼無法察覺的。有什麼問題？ –

我已經做了編輯，使問題更清晰。 – smilingbuddha

哦，我明白了。好了，好了，你的問題顯然有無關CUDA或推力，所以我建議刪除這些標籤，或許簡化了您的示例代碼（只使用一個'sleep'來電或東西）。 –

你是不是檢查clock_settime返回值。我猜想這是失敗的，可能errno設置爲EPERM或EINVAL。閱讀文檔並始終檢查您的返回值！

如果我是正確的，因爲你以爲你是，因此第二時間是累積與第一，再加上你不打算一些額外的東西都算你沒有重新設定時鐘。

做到這一點，正確的做法是隻調用clock_gettime，第一存儲結果，做了計算，然後減去結束時間初始時間。

來源

2011-11-11 03:39:25

clock_gettime（）的時序問題CUDA

回答

相關問題