2015-10-13 24 views
1

我正在經歷一個在線教程基於勢頭學習和Theano以下Theano方法中參數更新的方式是否存在錯誤?

過這個方法來
def gradient_updates_momentum(cost, params, learning_rate, momentum): 
    ''' 
Compute updates for gradient descent with momentum 

:parameters: 
    - cost : theano.tensor.var.TensorVariable 
     Theano cost function to minimize 
    - params : list of theano.tensor.var.TensorVariable 
     Parameters to compute gradient against 
    - learning_rate : float 
     Gradient descent learning rate 
    - momentum : float 
     Momentum parameter, should be at least 0 (standard gradient descent) and less than 1 

:returns: 
    updates : list 
     List of updates, one for each parameter 
''' 
# Make sure momentum is a sane value 
assert momentum < 1 and momentum >= 0 
# List of update steps for each parameter 
updates = [] 
# Just gradient descent on cost 
for param in params: 
    # For each parameter, we'll create a param_update shared variable. 
    # This variable will keep track of the parameter's update step across iterations. 
    # We initialize it to 0 
    param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable) 
    # Each parameter is updated by taking a step in the direction of the gradient. 
    # However, we also "mix in" the previous step according to the given momentum value. 
    # Note that when updating param_update, we are using its old value and also the new gradient step. 
    updates.append((param, param - learning_rate*param_update)) 
    # Note that we don't need to derive backpropagation to compute updates - just use T.grad! 
    updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param))) 
return updates 

不應在以下兩行的順序倒過來(互換)?

updates.append((param, param - learning_rate*param_update)) 

updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param))) 

據我所知,在執行後的列車方法和計算成本,只有更新運行時,正確的嗎?

這並不意味着我們應該使用當前成本,並與現有的param_update值(來自上一次迭代),我們應該計算更新的param_update,從而更新當前的參數值?

爲什麼它是相反的,爲什麼這是正確的?

回答

2

提供給theano.function的更新列表內的更新順序被忽略。更新始終使用共享變量的舊值來計算。

的這段代碼顯示,更新的順序被忽略:

import theano 
import theano.tensor 

p = 0.5 
param = theano.shared(1.) 
param_update = theano.shared(2.) 
cost = 3 * param * param 
update_a = (param, param - param_update) 
update_b = (param_update, p * param_update + (1 - p) * theano.grad(cost, param)) 
updates1 = [update_a, update_b] 
updates2 = [update_b, update_a] 
f1 = theano.function([], outputs=[param, param_update], updates=updates1) 
f2 = theano.function([], outputs=[param, param_update], updates=updates2) 
print f1(), f1() 
param.set_value(1) 
param_update.set_value(2) 
print f2(), f2() 

如果,從邏輯上講,你要

new_a = old_a + a_update 
new_b = new_a + b_update 

然後,你需要提供這樣的更新:

new_a = old_a + a_update 
new_b = old_a + a_update + b_update 
+0

謝謝丹尼爾。這是一段很棒的代碼來解釋這個概念。我認爲作者以這種方式編寫它,因爲可能採用先前的值並不會改變算法。 –

相關問題