2017-03-09 172 views
1

我建立了我自己的卷積神經網絡,其中我跟蹤所有訓練的變量值的移動平均值(tensorflow 1.0):tensorflow錯誤:恢復檢查點文件

variable_averages = tf.train.ExponentialMovingAverage(
     0.9999, global_step) 
variables_averages_op = variable_averages.apply(tf.trainable_variables()) 
train_op = tf.group(apply_gradient_op, variables_averages_op) 
saver = tf.train.Saver(tf.global_variables(), max_to_keep=10) 
summary_op = tf.summary.merge(summaries) 
init = tf.global_variables_initializer() 
sess = tf.Session(config=tf.ConfigProto(
     allow_soft_placement=True, 
     log_device_placement=False)) 
sess.run(init) 
# start queue runners 
tf.train.start_queue_runners(sess=sess) 

summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph) 

# training loop 
start_time = time.time() 
for step in range(FLAGS.max_steps): 
     _, loss_value = sess.run([train_op, loss]) 
     duration = time.time() - start_time 
     start_time = time.time() 
     assert not np.isnan(loss_value), 'Model diverged with loss = NaN' 

     if step % 1 == 0: 
      # print current model status 
      num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus 
      examples_per_sec = num_examples_per_step/duration 
      sec_per_batch = duration/FLAGS.num_gpus 
      format_str = '{} step{}, loss {}, {} examples/sec, {} sec/batch' 
      print(format_str.format(datetime.now(), step, loss_value, examples_per_sec, sec_per_batch)) 
     if step % 50 == 0: 
      summary_str = sess.run(summary_op) 
      summary_writer.add_summary(summary_str, step) 
     if step % 10 == 0 or step == FLAGS.max_steps: 
      print('save checkpoint') 
      # save checkpoint file 
      checkpoint_file = os.path.join(FLAGS.train_dir, 'model.ckpt') 
      saver.save(sess, checkpoint_file, global_step=step) 

這workes罰款和檢查點文件都保存(保護程序版本V2)。然後,我嘗試恢復用於評估模型的其他腳本中的檢查點。在那裏,我有這樣的一段代碼

# Restore the moving average version of the learned variables for eval. 
variable_averages = tf.train.ExponentialMovingAverage(
    MOVING_AVERAGE_DECAY) 
variables_to_restore = variable_averages.variables_to_restore() 
saver = tf.train.Saver(variables_to_restore) 

在那裏我得到錯誤「NotFoundError(見上文回溯):主要CONV 1 /變/ ExponentialMovingAverage檢查點未發現」,其中CONV 1 /變量/是一個變量的作用域。

甚至在我嘗試恢復變量之前,這個錯誤仍然存​​在。你能幫忙解決嗎?

在此先感謝

TheJude

回答

0

我解決它以這種方式:
呼叫tf.reset_default_graph()之前創建的圖表第二ExponentialMovingAverage(...)。

# reset the graph before create a new ema 
tf.reset_default_graph() 
# Restore the moving average version of the learned variables for eval. 
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY) 
variables_to_restore = variable_averages.variables_to_restore() 
saver = tf.train.Saver(variables_to_restore) 

花了我兩小時... ...

相關問題