thrust :: sort_by_key比qsort慢得多

-2

我發現thrust :: sort_by_key比qsort慢得多，它讓我感到平行排序性能低下，爲什麼？thrust :: sort_by_key比qsort慢得多

數據集爲100. qsort時間爲0.000026（s）。 GPU_sort時間爲0.000912（s）。

該數據集是1000. qsort時間是0.000205。 GPU_sort時間是0.003177。

數據集是10000. qsort時間是0.001598。 GPU_sort時間是0.031547。

數據集是100000. qsort時間是0.018564。 GPU_sort時間是0.31230。

該數據集是1000000. qsort time是0.219892。 GPU_sort時間是3.138608。

該數據集是10000000. qsort time是2.581469。 GPU_sort時間是85.456543。

這裏是我的代碼：

struct HashValue{ 
int id_; 
float proj_; 
}; 

int HashValueQsortComp(const void* e1, const void* e2)      

{ 

int ret = 0; 

HashValue* value1 = (HashValue *) e1; 

HashValue* value2 = (HashValue *) e2; 

if (value1->proj_ < value2->proj_) { 
    ret = -1; 
} else if (value1->proj_ > value2->proj_) { 
    ret = 1; 
} else { 
    if (value1->id_ < value2->id_) ret = -1; 
    else if (value1->id_ > value2->id_) ret = 1; 
} 
return ret; 
} 


const int N = 10; 

void sort_test() 
{ 

clock_t start_time = (clock_t)-1.0; 
clock_t end_Time = (clock_t)-1.0; 

HashValue *hashValue = new HashValue[N]; 
srand((unsigned)time(NULL)); 

for(int i=0; i < N; i++) 
{ 
    hashValue[i].id_ = i; 
    hashValue[i].proj_ = rand()/(float)(RAND_MAX/1000); 
} 

start_time = clock(); 
qsort(hashValue, N, sizeof(HashValue), HashValueQsortComp); 
end_Time = clock(); 
printf("The qsort time is %.6f\n", ((float)end_Time - start_time)/CLOCKS_PER_SEC); 

float *keys = new float[N]; 
int *values = new int[N]; 
for(int i=0; i<N; i++) 
{ 
    keys[i] = hashValue[i].proj_; 
    values[i] = hashValue[i].id_; 
} 
start_time = clock(); 
thrust::sort_by_key(keys, keys+N, values); 
end_Time = clock(); 
printf("The GPU_sort time is %.6f\n", ((float)end_Time - start_time)/CLOCKS_PER_SEC); 

delete[] hashValue; 
hashValue = NULL; 

delete[] keys; 
keys = NULL; 

delete[] values; 
values = NULL; 
}

來源

2017-04-13 user2431522

設備是K40。而我的CPU是1200.468兆赫，GenuineIntel – user2431522

你明白，推力排序沒有在GPU上運行？ – talonmies

的變量（keys，values）要傳遞到推力排序：

thrust::sort_by_key(keys, keys+N, values);

是主機變量。這意味着該算法的thrust will dispatch the host path，該算法不在GPU上運行。請參閱thrust quickstart guide以瞭解有關推力的更多信息，here是使用推力與設備變量一起使用的示例。

顯然，主機派遣推力排序比您的qsort實施慢。如果您使用設備路徑（和時間推力排序操作只），可能會更快。

來源

2017-04-15 00:09:58

感謝您的回答 – user2431522

thrust :: sort_by_key比qsort慢得多

回答

相關問題