Tensorflow梯度始終爲零

我已經寫其由相同的卷積核卷積圖像塊num_unrollings倍成一排，並隨後嘗試最小化平均平方所得的值和目標輸出之間的差值小Tensorflow程序。但是，當我使用大於1的num_unrollings運行模型時，我的損失（tf_loss）項相對於卷積內核（tf_kernel）的梯度爲零，因此不會發生學習。Tensorflow梯度始終爲零

這裏是最小的代碼（蟒蛇3）我可以想出一種再現問題，對長度遺憾：

import tensorflow as tf 
import numpy as np 

batch_size = 1 
kernel_size = 3 
num_unrollings = 2 

input_image_size = (kernel_size//2 * num_unrollings)*2 + 1 

graph = tf.Graph() 

with graph.as_default(): 
    # Input data 
    tf_input_images = tf.random_normal(
     [batch_size, input_image_size, input_image_size, 1] 
    ) 

    tf_outputs = tf.random_normal(
     [batch_size] 
    ) 

    # Convolution kernel 
    tf_kernel = tf.Variable(
     tf.zeros([kernel_size, kernel_size, 1, 1]) 
    ) 

    # Perform convolution(s) 
    _convolved_input = tf_input_images 
    for _ in range(num_unrollings): 
     _convolved_input = tf.nn.conv2d(
      _convolved_input, 
      tf_kernel, 
      [1, 1, 1, 1], 
      padding="VALID" 
     ) 

    tf_prediction = tf.reshape(_convolved_input, shape=[batch_size]) 

    tf_loss = tf.reduce_mean(
     tf.squared_difference(
      tf_prediction, 
      tf_outputs 
     ) 
    ) 

    # FIXME: why is this gradient zero when num_unrollings > 1?? 
    tf_gradient = tf.concat(0, tf.gradients(tf_loss, tf_kernel)) 

# Calculate and report gradient 
with tf.Session(graph=graph) as session: 

    tf.initialize_all_variables().run() 

    gradient = session.run(tf_gradient) 

    print(gradient.reshape(kernel_size**2)) 
    #prints [ 0. 0. 0. 0. 0. 0. 0. 0. 0.]

謝謝您的幫助！

來源

2016-05-17 CesiumLifeJacket

初始化內核採用全零是不是一個好主意，並會在這種情況下導致的0梯度。 – etarion

嘗試的東西，如更換

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.zeros([kernel_size, kernel_size, 1, 1]) 
)

：

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.random_normal([kernel_size, kernel_size, 1, 1]) 
)

來源

2016-05-17 14:06:31

Tensorflow梯度始終爲零

回答

相關問題