我正在嘗試學習theano並決定實施線性迴歸(使用教程中的Logistic迴歸作爲模板)。如果我的成本函數使用.sum(),則T.grad不起作用,但如果我的成本函數使用.mean(),則會起作用。代碼片段:Theano漸變不適用於.sum(),只有.mean()?
(這不起作用,導致了AW VECTOR FULL的NaN):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.sum()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
(這並不工作,完美地):
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name="b")
# now we do the actual expressions
h = T.dot(x,w) + b # prediction is dot product plus bias
single_error = .5 * ((h - y)**2)
cost = single_error.mean()
gw, gb = T.grad(cost, [w,b])
train = theano.function(inputs=[x,y], outputs=[h, single_error], updates = ((w, w - .1*gw), (b, b - .1*gb)))
predict = theano.function(inputs=[x], outputs=h)
for i in range(training_steps):
pred, err = train(D[0], D[1])
唯一的區別是在成本= single_error.sum()vs single_error.mean()。我不明白的是,在這兩種情況下梯度應該完全相同(一種只是另一種的縮放版本)。那麼是什麼給了?