如何通過一個clFinish配置順序啓動的多個OpenCL內核？

我有多個內核，並且它們在連續的方式是這樣展開：如何通過一個clFinish配置順序啓動的多個OpenCL內核？

 clEnqueueNDRangeKernel(..., kernel1, ...); 
     clEnqueueNDRangeKernel(..., kernel2, ...); 
     clEnqueueNDRangeKernel(..., kernel3, ...);

和多個內核共享一個全局緩存。現在

，我配置文件的每個內核執行，總結起來通過clEnqueueNDRangeKernel後添加的代碼塊來算總執行時間：

 clFinish(cmdQueue); 
     status = clGetEventProfilingInfo(...,&starttime,...); 
     clGetEventProfilingInfo(...,&endtime,...); 
     time_spent = endtime - starttime;

我的問題是，如何通過一個clFinish簡介三個內核一起？（比如在最後一次內核啓動後添加一個clFinish（））。

是的，我給每個clEnqueueNDRangeKernel不同的時間事件，並得到一個大的負數。的詳細信息：

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1); 
clFinish(cmdQueue); 
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime1,NULL); 
clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime1,NULL); 
time_spent1 = endtime1 - starttime1; 

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2); 
clFinish(cmdQueue); 
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime2,NULL); 
clGetEventProfilingInfo(timing_event2,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime2,NULL); 
time_spent2 = endtime2 - starttime2; 

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3); 
clFinish(cmdQueue); 
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime3,NULL); 
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime3,NULL); 
time_spent3 = endtime3 - starttime3; 

time_spent_all_0 = time_spent1 + time_spent2 + time_spent3; 
time_spent_all_1 = endtime3 - starttime1;

如果我有充分的clFinish，所有的分析值是合理的，但time_spent_all_1約2倍以上time_spent_all_0。如果我刪除除最後一個clFinish之外的所有clFinish，則所有分析值都是不合理的。

感謝Eric Bainville，我得到了我想要的結果：通過一個clFinish分析多個clEnqueueNDRangeKernel。下面是最後的代碼我使用：

clEnqueueNDRangeKernel(cmdQueue,...,&timing_event1); 
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event2); 
clEnqueueNDRangeKernel(cmdQueue,...,&timing_event3); 
clFinish(cmdQueue); 

clGetEventProfilingInfo(timing_event1,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&starttime,NULL); 
clGetEventProfilingInfo(timing_event3,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&endtime,NULL); 
time_spent = endtime - starttime;

來源

2012-07-06 jxj

每個clEnqueueNDRangeKernel會創建自己的cl_event：通話的最後arg是一個指向cl_event;如果最後一個參數不爲0，則會創建一個新事件。

命令完成後，可以查詢關聯事件的開始/結束分析信息。此事件必須在使用後釋放（致電clReleaseEvent）。

clFinish阻塞，直到所有入隊的命令完成。

您只需要致電clFinish，然後您就可以查詢所有事件的分析信息。

來源

2012-07-06 16:41:14

謝謝。但問題是，如果我只在最後一個clEnqueueNDRangeKernel之後放置一個clFinish，分析結果會變成一個很大的負數，如：-138142371031079。只有當我在每個clEnqueueNDRangeKernel之後放置clFinish時，分析結果才變得合理。 – jxj 2012-07-09 02:04:53

對於每個入隊命令你有一個不同的事件嗎？你能否更新你的代碼來顯示事件是如何管理的？ – 2012-07-09 02:14:54

是的，我總是給每個clEnqueueNDRangeKernel不同的時間事件，並得到大負數。 – jxj 2012-07-09 07:15:18

如何通過一個clFinish配置順序啓動的多個OpenCL內核？

回答

相關問題