2016-05-17 226 views
1

我已經寫其由相同的卷積核卷積圖像塊num_unrollings倍成一排,並隨後嘗試最小化平均平方所得的值和目標輸出之間的差值小Tensorflow程序。但是,當我使用大於1的num_unrollings運行模型時,我的損失(tf_loss)項相對於卷積內核(tf_kernel)的梯度爲零,因此不會發生學習。Tensorflow梯度始終爲零

這裏是最小的代碼(蟒蛇3)我可以想出一種再現問題,對長度遺憾:

import tensorflow as tf 
import numpy as np 

batch_size = 1 
kernel_size = 3 
num_unrollings = 2 

input_image_size = (kernel_size//2 * num_unrollings)*2 + 1 

graph = tf.Graph() 

with graph.as_default(): 
    # Input data 
    tf_input_images = tf.random_normal(
     [batch_size, input_image_size, input_image_size, 1] 
    ) 

    tf_outputs = tf.random_normal(
     [batch_size] 
    ) 

    # Convolution kernel 
    tf_kernel = tf.Variable(
     tf.zeros([kernel_size, kernel_size, 1, 1]) 
    ) 

    # Perform convolution(s) 
    _convolved_input = tf_input_images 
    for _ in range(num_unrollings): 
     _convolved_input = tf.nn.conv2d(
      _convolved_input, 
      tf_kernel, 
      [1, 1, 1, 1], 
      padding="VALID" 
     ) 

    tf_prediction = tf.reshape(_convolved_input, shape=[batch_size]) 

    tf_loss = tf.reduce_mean(
     tf.squared_difference(
      tf_prediction, 
      tf_outputs 
     ) 
    ) 

    # FIXME: why is this gradient zero when num_unrollings > 1?? 
    tf_gradient = tf.concat(0, tf.gradients(tf_loss, tf_kernel)) 

# Calculate and report gradient 
with tf.Session(graph=graph) as session: 

    tf.initialize_all_variables().run() 

    gradient = session.run(tf_gradient) 

    print(gradient.reshape(kernel_size**2)) 
    #prints [ 0. 0. 0. 0. 0. 0. 0. 0. 0.] 

謝謝您的幫助!

+0

初始化內核採用全零是不是一個好主意,並會在這種情況下導致的0梯度。 – etarion

回答

1

嘗試的東西,如更換

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.zeros([kernel_size, kernel_size, 1, 1]) 
) 

# Convolution kernel 
tf_kernel = tf.Variable(
    tf.random_normal([kernel_size, kernel_size, 1, 1]) 
)