以下Theano方法中參數更新的方式是否存在錯誤？

我正在經歷一個在線教程基於勢頭學習和Theano以下Theano方法中參數更新的方式是否存在錯誤？

過這個方法來

def gradient_updates_momentum(cost, params, learning_rate, momentum): 
    ''' 
Compute updates for gradient descent with momentum 

:parameters: 
    - cost : theano.tensor.var.TensorVariable 
     Theano cost function to minimize 
    - params : list of theano.tensor.var.TensorVariable 
     Parameters to compute gradient against 
    - learning_rate : float 
     Gradient descent learning rate 
    - momentum : float 
     Momentum parameter, should be at least 0 (standard gradient descent) and less than 1 

:returns: 
    updates : list 
     List of updates, one for each parameter 
''' 
# Make sure momentum is a sane value 
assert momentum < 1 and momentum >= 0 
# List of update steps for each parameter 
updates = [] 
# Just gradient descent on cost 
for param in params: 
    # For each parameter, we'll create a param_update shared variable. 
    # This variable will keep track of the parameter's update step across iterations. 
    # We initialize it to 0 
    param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable) 
    # Each parameter is updated by taking a step in the direction of the gradient. 
    # However, we also "mix in" the previous step according to the given momentum value. 
    # Note that when updating param_update, we are using its old value and also the new gradient step. 
    updates.append((param, param - learning_rate*param_update)) 
    # Note that we don't need to derive backpropagation to compute updates - just use T.grad! 
    updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param))) 
return updates

不應在以下兩行的順序倒過來（互換）？

updates.append((param, param - learning_rate*param_update))

和

updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))

據我所知，在執行後的列車方法和計算成本，只有更新運行時，正確的嗎？

這並不意味着我們應該使用當前成本，並與現有的param_update值（來自上一次迭代），我們應該計算更新的param_update，從而更新當前的參數值？

爲什麼它是相反的，爲什麼這是正確的？

來源

2015-10-13 London guy

提供給theano.function的更新列表內的更新順序被忽略。更新始終使用共享變量的舊值來計算。

的這段代碼顯示，更新的順序被忽略：

import theano 
import theano.tensor 

p = 0.5 
param = theano.shared(1.) 
param_update = theano.shared(2.) 
cost = 3 * param * param 
update_a = (param, param - param_update) 
update_b = (param_update, p * param_update + (1 - p) * theano.grad(cost, param)) 
updates1 = [update_a, update_b] 
updates2 = [update_b, update_a] 
f1 = theano.function([], outputs=[param, param_update], updates=updates1) 
f2 = theano.function([], outputs=[param, param_update], updates=updates2) 
print f1(), f1() 
param.set_value(1) 
param_update.set_value(2) 
print f2(), f2()

如果，從邏輯上講，你要

new_a = old_a + a_update 
new_b = new_a + b_update

然後，你需要提供這樣的更新：

new_a = old_a + a_update 
new_b = old_a + a_update + b_update

來源

2015-10-13 13:05:16

謝謝丹尼爾。這是一段很棒的代碼來解釋這個概念。我認爲作者以這種方式編寫它，因爲可能採用先前的值並不會改變算法。 –

以下Theano方法中參數更新的方式是否存在錯誤？

回答

相關問題