1
指令重播其它原因這是輸出I從nvprof得到(CUDA 5.5):在CUDA
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K40c (0)"
Kernel: MyKernel(double const *, double const *, double*, int, int, int)
60 inst_replay_overhead Instruction Replay Overhead 0.736643 0.925197 0.817188
60 shared_replay_overhead Shared Memory Replay Overhead 0.000000 0.000000 0.000000
60 global_replay_overhead Global Memory Replay Overhead 0.108972 0.108972 0.108972
60 global_cache_replay_overhead Global Memory Cache Replay Ove 0.000000 0.000000 0.000000
60 local_replay_overhead Local Memory Cache Replay Over 0.000000 0.000000 0.000000
60 gld_transactions Global Load Transactions 25000 25000 25000
60 gst_transactions Global Store Transactions 75000 75000 75000
60 warp_nonpred_execution_efficie Warp Non-Predicated Execution 99.63% 99.63% 99.63%
60 cf_issued Issued Control-Flow Instructio 44911 45265 45101
60 cf_executed Executed Control-Flow Instruct 39533 39533 39533
60 ldst_issued Issued Load/Store Instructions 273117 353930 313341
60 ldst_executed Executed Load/Store Instructio 50016 50016 50016
60 stall_data_request Issue Stall Reasons (Data Requ 65.21% 68.93% 67.86%
60 inst_executed Instructions Executed 458686 458686 458686
60 inst_issued Instructions Issued 789220 879145 837129
60 issue_slots Issue Slots 716816 803393 759614
內核使用356個字節CMEM [0]和沒有共享內存。此外,沒有登記溢出。 我的問題是,在這種情況下,指令重播的原因是什麼?我們看到81%的開銷,但這些數字沒有加起來。
謝謝!
謝謝羅伯特。當我們想發出一個加載/存儲指令,但是所有的功能單元都很忙時,我們會停下來。攤位是否計入指令重播? – user1096294
另外,在我的代碼中,所有線程總是訪問常量內存中的相同位置,並且也存在很小的經向偏差(只發生在我使用的256個塊中的一箇中)。你能提出其他原因嗎? – user1096294