2017-07-16 78 views
0

我是Tensorflow的新手,所以這個問題可能真的很愚蠢。Tensorflow Layers API CNN參數在訓練過程中不會改變

我一直在嘗試使用Tensorflow爲MNIST手寫數字數據集編寫一個簡單的CNN。問題在於參數未被優化器更新(由Tensorboard摘要監控)。
儘管由Layers API創建的範圍看起來很奇怪,但圖似乎還行。漸變是從每個圖層計算出來的。 請幫忙!

我使用從這裏訓練數據:http://yann.lecun.com/exdb/mnist/

下面是代碼

import tensorflow as tf 

DATA = 'train-images.idx3-ubyte' 
LABELS = 'train-labels.idx1-ubyte' 
NUM_EPOCHS = 2 
BATCH_SIZE = 15 
#Data definition 
data_queue = tf.train.string_input_producer([DATA,]) 
label_queue = tf.train.string_input_producer([LABELS,]) 

reader_data = tf.FixedLengthRecordReader(record_bytes=28*28, header_bytes = 16) 
reader_labels = tf.FixedLengthRecordReader(record_bytes=1, header_bytes = 8) 

(_,data_rec) = reader_data.read(data_queue) 
(_,label_rec) = reader_labels.read(label_queue) 

image = tf.decode_raw(data_rec, tf.uint8) 
image = tf.reshape(image, [28, 28, 1]) 
label = tf.decode_raw(label_rec, tf.uint8) 
label = tf.reshape(label, [1]) 

image_batch, label_batch = tf.train.shuffle_batch([image, label], 
               batch_size=BATCH_SIZE, 
               capacity=100, 
               min_after_dequeue = 30) 
#Layers definition 
conv = tf.layers.conv2d(
    inputs=tf.cast(image_batch, tf.float32), 
    filters=15, 
    kernel_size=[5,5], 
    padding='same', 
    activation=tf.nn.relu) 

conv1 = tf.layers.conv2d(
    inputs=conv, 
    filters=15, 
    kernel_size=[3,3], 
    padding='same', 
    activation=tf.nn.relu) 

pool_flat = tf.reshape(conv1, [BATCH_SIZE, -1]) 

dense1 = tf.layers.dense(inputs=pool_flat, units=30, activation=tf.nn.relu) 

output = tf.nn.softmax(tf.layers.dense(inputs=dense1, units=10)) 

#train operation definition 
onehot_labels = tf.one_hot(indices=tf.cast(tf.reshape(label_batch,[-1]), tf.int32), depth=10) 

loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, 
             logits=output) 

global_step = tf.Variable(0,name='global_step',trainable=False) 
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(loss, global_step = global_step) 

#Summaries definition 

for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='conv2d'): 
    tf.summary.histogram(var.name, var) 
for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='conv2d_1'): 
    tf.summary.histogram(var.name, var) 
for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='dense'): 
    tf.summary.histogram(var.name, var) 
for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='dense_1'): 
    tf.summary.histogram(var.name, var) 
tf.summary.image("inp", image_batch, max_outputs =1) 
loss_summary = tf.summary.scalar("loss", loss) 
summaries = tf.summary.merge_all() 

#init 
sess = tf.Session() 
summary_writer = tf.summary.FileWriter('log_simple_stats', sess.graph) 
coord = tf.train.Coordinator() 
threads = tf.train.start_queue_runners(coord=coord, sess=sess) 
sess.run(tf.global_variables_initializer()) 

#loop 
for i in range((60000*NUM_EPOCHS)//BATCH_SIZE): 
    sess.run(train_op) 
    if(i%100): 
     merged = sess.run(summaries) 
     summary_writer.add_summary(merged, i) 


coord.request_stop() 
coord.join(threads) 

編輯 定製層給予相同的結果。

自定義層:

def convol(input, inp, outp, name="conv"): 
    with tf.name_scope(name): 
     w = tf.Variable(tf.truncated_normal([5, 5, inp, outp], stddev=0.1),name="W") 
     b = tf.Variable(tf.constant(0.1, shape=[outp]), name="B") 
     filtered = tf.nn.conv2d(input, w, strides=[1,1,1,1], padding="SAME", name="conv2d") 
     activation = tf.nn.relu(features=(filtered+b), name="activation") 
     tf.summary.histogram(name=w.name, values=w) 
     tf.summary.histogram(name=b.name, values=b) 
     tf.summary.histogram(name=activation.name, values=activation) 
     return activation 

def dense(input, inp, outp, name="dense"): 
    with tf.name_scope(name): 
     w = tf.Variable(tf.truncated_normal([inp, outp], stddev=0.1), name="W") 
     b = tf.Variable(tf.constant(0.1, shape=[outp]), name="B") 
     act = tf.matmul(input, w) + b 
     tf.summary.histogram(name=w.name, values=w) 
     tf.summary.histogram(name=b.name, values=b) 
     tf.summary.histogram(name="activation", values=act) 
     return act 

編輯:

所以一段時間後,這個和MNIST例子搞亂從TF我注意到,權重不被教訓。我處理數據讀取的方式已經搞亂了梯度計算。我剛剛錄製了將MNIST數據集讀入我的代碼的類,並且它無需調整參數即可100%運行。

回答

0

我有同樣的問題,我convnet和我的原因只不過是以下組合:

  • 沒有足夠的迭代:它可能需要相當長一段時間的變化變得可見(幾十數千次迭代)
  • 模型複雜化:大幅減少第一次測試的過濾器數量,然後慢慢增加它們以適合您的用例,以確保它不是別的。

爲了獲得更好的調試,嘗試用Tensorboard可視化你的過濾器,這個要點對我幫助很大:

https://gist.github.com/kukuruza/03731dc494603ceab0c5

無論你的方法(tf.layers &手動創建的變量)應該正確地連接到train_op,所以我不認爲有什麼錯。

+0

感謝您的回答!我會嘗試用小參數運行它,但我懷疑事實並非如此。這裏有1.8個時代的數據(100000次運行)傳遞給train_op,它沒有改變任何內容。 [graph_from_tensorboard](https://ibb.co/bSt3X5) –

相關問題