對於這項研究,我們必須分析CPU和GPU之間的性能差異。我的問題是,我有一個.cu文件,只有cpp代碼和.cpp文件具有完全相同的代碼。但性能差異在於.cu文件比.cpp文件運行速度快3倍。 .cu文件將由NVCC編譯器編譯,但NVCC編譯器只會編譯cuda代碼,並且沒有cuda代碼,所以它將由主機cpp編譯器編譯。那就是我的問題。我沒有表現出性能差異。.cu和.cpp文件之間的性能差異
#include <iostream>
#include <conio.h>
#include <ctime>
#include <cuda.h>
#include <cuda_runtime.h> // Stops underlining of __global__
#include <device_launch_parameters.h> // Stops underlining of threadIdx etc.
using namespace std;
void FindClosestCPU(float3* points, int* indices, int count) {
// Base case, if there's 1 point don't do anything
if(count <= 1) return;
// Loop through every point
for(int curPoint = 0; curPoint < count; curPoint++) {
// This variable is nearest so far, set it to float.max
float distToClosest = 3.40282e38f;
// See how far it is from every other point
for(int i = 0; i < count; i++) {
// Don't check distance to itself
if(i == curPoint) continue;
float dist = sqrt((points[curPoint].x - points[i].x) *
(points[curPoint].x - points[i].x) +
(points[curPoint].y - points[i].y) *
(points[curPoint].y - points[i].y) +
(points[curPoint].z - points[i].z) *
(points[curPoint].z - points[i].z));
if(dist < distToClosest) {
distToClosest = dist;
indices[curPoint] = i;
}
}
}
}
int main()
{
// Number of points
const int count = 10000;
// Arrays of points
int *indexOfClosest = new int[count];
float3 *points = new float3[count];
// Create a list of random points
for(int i = 0; i < count; i++)
{
points[i].x = (float)((rand()%10000) - 5000);
points[i].y = (float)((rand()%10000) - 5000);
points[i].z = (float)((rand()%10000) - 5000);
}
// This variable is used to keep track of the fastest time so far
long fastest = 1000000;
// Run the algorithm 2 times
for(int q = 0; q < 2; q++)
{
long startTime = clock();
// Run the algorithm
FindClosestCPU(points, indexOfClosest, count);
long finishTime = clock();
cout<<"Run "<<q<<" took "<<(finishTime - startTime)<<" millis"<<endl;
// If that run was faster update the fastest time so far
if((finishTime - startTime) < fastest)
fastest = (finishTime - startTime);
}
// Print out the fastest time
cout<<"Fastest time: "<<fastest<<endl;
// Print the final results to screen
cout<<"Final results:"<<endl;
for(int i = 0; i < 10; i++)
cout<<i<<"."<<indexOfClosest[i]<<endl;
// Deallocate ram
delete[] indexOfClosest;
delete[] points;
_getch();
return 0;
}
兩個文件之間唯一的區別是,一個是.CU文件並將被NVCC被編譯,另一種是一個cpp文件和將被由CPP編譯器通常編譯。
什麼是你在每種情況下使用的編譯命令?系統上每種情況下的實際時間測量是什麼? –
我用visual studio 2012編譯它。.cu版本需要〜2000毫秒,而.cpp版本需要~8000毫秒。我始終以「無需調試即可啓動」啓動程序。 – user3107260
好吧,這是我的錯。我使用了空洞時間調試配置,而不是發佈配置。當我在realse配置下嘗試時,沒有時差。 – user3107260