在CUDA

指令重播其它原因這是輸出I從nvprof得到（CUDA 5.5）：在CUDA

Invocations     Metric Name    Metric Description   Min   Max   Avg 
Device "Tesla K40c (0)" 
Kernel: MyKernel(double const *, double const *, double*, int, int, int) 
    60   inst_replay_overhead  Instruction Replay Overhead 0.736643 0.925197 0.817188 
    60   shared_replay_overhead Shared Memory Replay Overhead 0.000000 0.000000 0.000000 
    60   global_replay_overhead Global Memory Replay Overhead 0.108972 0.108972 0.108972 
    60 global_cache_replay_overhead Global Memory Cache Replay Ove 0.000000 0.000000 0.000000 
    60   local_replay_overhead Local Memory Cache Replay Over 0.000000 0.000000 0.000000 
    60    gld_transactions  Global Load Transactions  25000  25000  25000 
    60    gst_transactions  Global Store Transactions  75000  75000  75000 
    60 warp_nonpred_execution_efficie Warp Non-Predicated Execution  99.63%  99.63%  99.63% 
    60      cf_issued Issued Control-Flow Instructio  44911  45265  45101 
    60      cf_executed Executed Control-Flow Instruct  39533  39533  39533 
    60      ldst_issued Issued Load/Store Instructions  273117  353930  313341 
    60     ldst_executed Executed Load/Store Instructio  50016  50016  50016 
    60    stall_data_request Issue Stall Reasons (Data Requ  65.21%  68.93%  67.86% 
    60     inst_executed   Instructions Executed  458686  458686  458686 
    60      inst_issued    Instructions Issued  789220  879145  837129 
    60      issue_slots      Issue Slots  716816  803393  759614

內核使用356個字節CMEM [0]和沒有共享內存。此外，沒有登記溢出。我的問題是，在這種情況下，指令重播的原因是什麼？我們看到81％的開銷，但這些數字沒有加起來。

謝謝！

來源

2014-02-28 user1096294

一些可能的原因：

存儲器組衝突（你沒有）
不變內存衝突（即不同的線程在經請求在不斷的記憶來自不同地區的相同的指令）
經發散代碼（if..then..else取爲不同的線程不同的充路徑在一個經向）

這presentation可以是INTE的休息，特別是8-11的幻燈片。

來源

2014-02-28 19:05:36

謝謝羅伯特。當我們想發出一個加載/存儲指令，但是所有的功能單元都很忙時，我們會停下來。攤位是否計入指令重播？ – user1096294

另外，在我的代碼中，所有線程總是訪問常量內存中的相同位置，並且也存在很小的經向偏差（只發生在我使用的256個塊中的一箇中）。你能提出其他原因嗎？ – user1096294

回答

相關問題