火炬線性模型在GPU上正向傳遞4倍，然後CPU

我正在使用炬7的AWS GPU實例之一工作。以下代碼基準線性模型的簡單正向傳遞。 GPU的執行速度似乎要慢大約4倍。我究竟做錯了什麼？火炬線性模型在GPU上正向傳遞4倍，然後CPU

require 'torch'; 
require 'nn'; 

cmd = torch.CmdLine() 
cmd:option("-gpu", 0) -- gpu/cpu 
cmd:option("-n_in", 100) 
cmd:option("-n_out", 100) 
cmd:option("-n_iter", 1000) 

params = cmd:parse(arg) 
A = torch.Tensor():randn(params.n_in); 
model = nn.Sequential():add(nn.Linear(params.n_in, params.n_out)) 

if params.gpu>0 then 
    require 'cutorch'; 
    require 'cudnn'; 
    A = A:cuda() 
    model = model:cuda() 
end 

timer = torch.Timer() 

for i=1,params.n_iter do 
    A2 = model:forward(A) 
end 
print("Average time:" .. timer:time().real/params.n_iter)

來源

2016-06-12 pavel

嘗試更大的尺寸 – kangshiyin

謝謝，這似乎是它！使用-n_in 10000 -n_out 500運行可以在GPU上提供大約30倍的加速比。 – pavel

您需要足夠大的網絡來充分利用GPU。對於小型網絡（< 500 x 500），包括GPU內核啓動，PCI-E數據傳輸等在內的開銷將佔用很大一部分培訓時間。在這種情況下，您可能需要使用CPU。

來源

2016-06-12 14:05:24 kangshiyin

火炬線性模型在GPU上正向傳遞4倍，然後CPU

回答

相關問題