1
我正在經歷一個在線教程基於勢頭學習和Theano以下Theano方法中參數更新的方式是否存在錯誤?
過這個方法來def gradient_updates_momentum(cost, params, learning_rate, momentum):
'''
Compute updates for gradient descent with momentum
:parameters:
- cost : theano.tensor.var.TensorVariable
Theano cost function to minimize
- params : list of theano.tensor.var.TensorVariable
Parameters to compute gradient against
- learning_rate : float
Gradient descent learning rate
- momentum : float
Momentum parameter, should be at least 0 (standard gradient descent) and less than 1
:returns:
updates : list
List of updates, one for each parameter
'''
# Make sure momentum is a sane value
assert momentum < 1 and momentum >= 0
# List of update steps for each parameter
updates = []
# Just gradient descent on cost
for param in params:
# For each parameter, we'll create a param_update shared variable.
# This variable will keep track of the parameter's update step across iterations.
# We initialize it to 0
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
# Each parameter is updated by taking a step in the direction of the gradient.
# However, we also "mix in" the previous step according to the given momentum value.
# Note that when updating param_update, we are using its old value and also the new gradient step.
updates.append((param, param - learning_rate*param_update))
# Note that we don't need to derive backpropagation to compute updates - just use T.grad!
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
return updates
不應在以下兩行的順序倒過來(互換)?
updates.append((param, param - learning_rate*param_update))
和
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
據我所知,在執行後的列車方法和計算成本,只有更新運行時,正確的嗎?
這並不意味着我們應該使用當前成本,並與現有的param_update值(來自上一次迭代),我們應該計算更新的param_update,從而更新當前的參數值?
爲什麼它是相反的,爲什麼這是正確的?
謝謝丹尼爾。這是一段很棒的代碼來解釋這個概念。我認爲作者以這種方式編寫它,因爲可能採用先前的值並不會改變算法。 –