假設我有一個簡單的MLP輸出WRT網絡權的持有另一個輸出恆定
而且我有一些損失函數的梯度相對於輸出層得到g^= [0,-1](也就是增加第二個輸出變量減少損失函數)。
如果我根據我的網絡參數獲得G的梯度並應用梯度體面權重更新,那麼第二個輸出變量應該增加,但對第一個輸出變量沒有任何說法,梯度的縮放應用幾乎肯定會改變輸出變量(是增加還是減少)
如何修改我的損失函數或任何梯度計算以確保第一個輸出不會更改?
假設我有一個簡單的MLP輸出WRT網絡權的持有另一個輸出恆定
而且我有一些損失函數的梯度相對於輸出層得到g^= [0,-1](也就是增加第二個輸出變量減少損失函數)。
如果我根據我的網絡參數獲得G的梯度並應用梯度體面權重更新,那麼第二個輸出變量應該增加,但對第一個輸出變量沒有任何說法,梯度的縮放應用幾乎肯定會改變輸出變量(是增加還是減少)
如何修改我的損失函數或任何梯度計算以確保第一個輸出不會更改?
更新:我誤解了這個問題。這是新的答案。
爲此,您需要僅更新隱藏層和第二個輸出單元之間的連接,同時保持隱藏層和第一個輸出單元之間的連接完好。
第一種方法是引入兩組變量:一個用於隱藏層和第一個輸出單元之間的連接,其餘一個用於連接。然後您可以使用tf.stack
將它們組合起來,並通過var_list
以獲得相應的衍生產品。這就像(僅用於演示未測試小心使用。):
out1 = tf.matmul(hidden, W_h_to_out1) + b_h_to_out1
out2 = tf.matmul(hidden, W_h_to_out2) + b_h_to_out2
out = tf.stack([out1, out2])
out = tf.transpose(tf.reshape(out, [2, -1]))
loss = some_function_of(out)
optimizer = tf.train.GradientDescentOptimizer(0.1)
train_op_second_unit = optimizer.minimize(loss, var_list=[W_h_to_out2, b_h_to_out2])
另一種方法是使用口罩。當你使用一些框架(比如纖細,Keras等)時,這更容易實現,並且更靈活,我會以這種方式推薦。將第一個輸出單元隱藏到丟失功能的想法,同時不改變第二個輸出單元。 這可以通過使用二進制變量完成:如果要保留它,請將乘以1,然後乘以0將其刪除。下面的代碼:
import tensorflow as tf
import numpy as np
# let's make our tiny dataset: (x, y) pairs, where x = (x1, x2, x3), y = (y1, y2),
# and y1 = x1+x2+x3, y2 = x1^2+x2^2+x3^2
# n_sample data points
n_sample = 8
data_x = np.random.random((n_sample, 3))
data_y = np.zeros((n_sample, 2))
data_y[:, 0] += np.sum(data_x, axis=1)
data_y[:, 1] += np.sum(data_x**2, axis=1)
data_y += 0.01 * np.random.random((n_sample, 2)) # add some noise
# build graph
# suppose we have a network of shape [3, 4, 2], i.e.: one hidden layer of size 4.
x = tf.placeholder(tf.float32, shape=[None, 3], name='x')
y = tf.placeholder(tf.float32, shape=[None, 2], name='y')
mask = tf.placeholder(tf.float32, shape=[None, 2], name='mask')
W1 = tf.Variable(tf.random_normal(shape=[3, 4], stddev=0.1), name='W1')
b1 = tf.Variable(tf.random_normal(shape=[4], stddev=0.1), name='b1')
hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.random_normal(shape=[4, 2], stddev=0.1), name='W2')
b2 = tf.Variable(tf.random_normal(shape=[2], stddev=0.1), name='b2')
out = tf.matmul(hidden, W2) + b2
loss = tf.reduce_mean(tf.square(out - y))
# multiply out by mask, thus out[0] is "invisible" to loss, and its gradient will not be propagated
masked_out = mask * out
loss2 = tf.reduce_mean(tf.square(masked_out - y))
optimizer = tf.train.GradientDescentOptimizer(0.1)
train_op_all = optimizer.minimize(loss) # update all variables in the network
train_op12 = optimizer.minimize(loss, var_list=[W2, b2]) # update hidden -> output layer
train_op2 = optimizer.minimize(loss2, var_list=[W2, b2]) # update hidden -> second output unit
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
mask_out1 = np.zeros((n_sample, 2))
mask_out1[:, 1] += 1.0
# print(mask_out1)
print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y: data_y, mask: mask_out1}))
# In this case, only out2 is updated. You see the loss and loss2 decreases.
sess.run(train_op2, feed_dict={x: data_x, y:data_y, mask: mask_out1})
print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
# In this case, both out1 and out2 is updated. You see the loss and loss2 decreases.
sess.run(train_op12, feed_dict={x: data_x, y:data_y, mask: mask_out1})
print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
# In this case, everything is updated. You see the loss and loss2 decreases.
sess.run(train_op_all, feed_dict={x: data_x, y:data_y, mask: mask_out1})
print(sess.run([hidden, out, loss, loss2], feed_dict={x: data_x, y:data_y, mask: mask_out1}))
sess.close()
=======================下面是舊的答案========== ====================
要得到衍生產品wrt不同的變量,你可以通過一個var_list
來決定更新哪個變量。下面是一個例子:
import tensorflow as tf
import numpy as np
# let's make our tiny dataset: (x, y) pairs, where x = (x1, x2, x3), y = (y1, y2),
# and y1 = x1+x2+x3, y2 = x1^2+x2^2+x3^2
# n_sample data points
n_sample = 8
data_x = np.random.random((n_sample, 3))
data_y = np.zeros((n_sample, 2))
data_y[:, 0] += np.sum(data_x, axis=1)
data_y[:, 1] += np.sum(data_x**2, axis=1)
data_y += 0.01 * np.random.random((n_sample, 2)) # add some noise
# build graph
# suppose we have a network of shape [3, 4, 2], i.e.: one hidden layer of size 4.
x = tf.placeholder(tf.float32, shape=[None, 3], name='x')
y = tf.placeholder(tf.float32, shape=[None, 2], name='y')
W1 = tf.Variable(tf.random_normal(shape=[3, 4], stddev=0.1), name='W1')
b1 = tf.Variable(tf.random_normal(shape=[4], stddev=0.1), name='b1')
hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
W2 = tf.Variable(tf.random_normal(shape=[4, 2], stddev=0.1), name='W2')
b2 = tf.Variable(tf.random_normal(shape=[2], stddev=0.1), name='b2')
out = tf.matmul(hidden, W2) + b2
loss = tf.reduce_mean(tf.square(out - y))
optimizer = tf.train.GradientDescentOptimizer(0.1)
# You can pass a variable list to decide which variable(s) to minimize.
train_op_second_layer = optimizer.minimize(loss, var_list=[W2, b2])
# If there is no var_list, all variables will be updated.
train_op_all = optimizer.minimize(loss)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
# In this case, only W2 and b2 are updated. You see the loss decreases.
sess.run(train_op_second_layer, feed_dict={x: data_x, y:data_y})
print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
# In this case, all variables are updated. You see the loss decreases.
sess.run(train_op_all, feed_dict={x: data_x, y:data_y})
print(sess.run([W1, b1, W2, b2, loss], feed_dict={x: data_x, y:data_y}))
sess.close()
如何設置'可訓練= FALSE',[變量](https://www.tensorflow.org/versions/r0.12/api_docs/python/state_ops/variables) – xxi
此是不一樣的 - 問題在於兩個輸出都受到權重變化的影響 - 對權重應用輸出的梯度會引起兩個輸出的變化,但是我們希望梯度以某種方式說明一個事實:輸出應該在漸變步驟後保持不變 – Robert
@Robert哦,我明白了。我誤解了你的問題。我會更新我的答案。 – soloice