1
我是張量流的新手,並且一直在查看示例here。我想重寫多層感知器分類模型作爲迴歸模型。但是當修改損失函數時,我遇到了一些奇怪的行爲。它可以很好地與tf.reduce_mean
,但如果我嘗試使用tf.reduce_sum
它輸出南。這似乎很奇怪,因爲函數非常相似 - 唯一的區別是平均值將總和結果除以元素數量?所以我不明白這個改變怎麼會引入nan?損失函數與reduce_mean一起使用,但不是reduce_sum
import tensorflow as tf
# Parameters
learning_rate = 0.001
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 32 # 2nd layer number of features
n_input = 2 # number of inputs
n_output = 1 # number of outputs
# Make artificial data
SAMPLES = 1000
X = np.random.rand(SAMPLES, n_input)
T = np.c_[X[:,0]**2 + np.sin(X[:,1])]
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_output])
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with tanh activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.tanh(layer_1)
# Hidden layer with tanh activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.tanh(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_output]))
}
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
#se = tf.reduce_sum(tf.square(pred - y)) # Why does this give nans?
mse = tf.reduce_mean(tf.square(pred - y)) # When this doesn't?
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)
# Initializing the variables
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
training_epochs = 10
display_step = 1
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
# Loop over all batches
for i in range(100):
# Run optimization op (backprop) and cost op (to get loss value)
_, msev = sess.run([optimizer, mse], feed_dict={x: X, y: T})
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "mse=", \
"{:.9f}".format(msev))
有問題的變量se
被註釋掉。應該用它來代替mse
。
隨着mse
輸出看起來像這樣:
Epoch: 0001 mse= 0.051669389
Epoch: 0002 mse= 0.031438075
Epoch: 0003 mse= 0.026629323
...
並用se
它結束了這樣的:
Epoch: 0001 se= nan
Epoch: 0002 se= nan
Epoch: 0003 se= nan
...
謝謝!我試圖將學習率除以樣本數量,並且工作。我會計算未來的每個樣品的誤差,但很好理解它爲什麼表現這種方式:) –