Theano漸變不適用於.sum（），只有.mean（）？

我正在嘗試學習theano並決定實施線性迴歸（使用教程中的Logistic迴歸作爲模板）。如果我的成本函數使用.sum（），則T.grad不起作用，但如果我的成本函數使用.mean（），則會起作用。代碼片段：Theano漸變不適用於.sum（），只有.mean（）？

（這不起作用，導致了AW VECTOR FULL的NaN）：

x = T.matrix('x') 
y = T.vector('y') 

w = theano.shared(rng.randn(feats), name='w') 
b = theano.shared(0., name="b") 

# now we do the actual expressions 
h = T.dot(x,w) + b # prediction is dot product plus bias 
single_error = .5 * ((h - y)**2) 
cost = single_error.sum() 
gw, gb = T.grad(cost, [w,b]) 

train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb))) 
predict = theano.function(inputs=[x], outputs=h) 

for i in range(training_steps): 
    pred, err = train(D[0], D[1])

（這並不工作，完美地）：

x = T.matrix('x') 
y = T.vector('y') 

w = theano.shared(rng.randn(feats), name='w') 
b = theano.shared(0., name="b") 

# now we do the actual expressions 
h = T.dot(x,w) + b # prediction is dot product plus bias 
single_error = .5 * ((h - y)**2) 
cost = single_error.mean() 
gw, gb = T.grad(cost, [w,b]) 

train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb))) 
predict = theano.function(inputs=[x], outputs=h) 

for i in range(training_steps): 
    pred, err = train(D[0], D[1])

唯一的區別是在成本= single_error.sum（）vs single_error.mean（）。我不明白的是，在這兩種情況下梯度應該完全相同（一種只是另一種的縮放版本）。那麼是什麼給了？

來源

2014-11-25 user3121136

嘗試將梯度下降步長除以訓練樣例的數量。

來源

2014-11-26 01:42:00

學習率（0.1）是很大的。使用意味着它除以批量大小，所以這有助於。但我很確定你應該把它變得更小。不只是除以批量大小（相當於使用均值）。

嘗試學習率爲0.001。

來源

2014-12-20 04:37:00 nouiz

Theano漸變不適用於.sum（），只有.mean（）？

回答

相關問題