我一直在訓練TensorFlow模型約一週,偶爾會有微調。TensorFlow:NotFoundError:在檢查點找不到關鍵
今天,當我試圖微調模型我得到了錯誤:
tensorflow.python.framework.errors_impl.NotFoundError: Key conv_classifier/loss/total_loss/avg not found in checkpoint
[[Node: save/RestoreV2_37 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_37/tensor_names, save/RestoreV2_37/shape_and_slices)]]
使用inspect_checkpoint.py我看到檢查點文件現已在它兩個空層:
...
conv_decode4/ort_weights/Momentum (DT_FLOAT) [7,7,64,64]
loss/cross_entropy/avg (DT_FLOAT) []
loss/total_loss/avg (DT_FLOAT) []
up1/up_filter (DT_FLOAT) [2,2,64,64]
...
如何我能解決這個問題嗎?
SOLUTION:下面編輯爲清楚起見
繼mrry建議:
code_to_checkpoint_variable_map = {var.op.name: var for var in tf.global_variables()}
for code_variable_name, checkpoint_variable_name in {
"inference/conv_classifier/weight_loss/avg" : "loss/weight_loss/avg",
"inference/conv_classifier/loss/total_loss/avg" : "loss/total_loss/avg",
"inference/conv_classifier/loss/cross_entropy/avg": "loss/cross_entropy/avg",
}.items():
code_to_checkpoint_variable_map[checkpoint_variable_name] = code_to_checkpoint_variable_map[code_variable_name]
del code_to_checkpoint_variable_map[code_variable_name]
saver = tf.train.Saver(code_to_checkpoint_variable_map)
saver.restore(sess, tf.train.latest_checkpoint('./logs'))
檢查點文件中的關鍵是「loss/weight_loss/avg」嗎?第一個異常消息表明它不是。 (我不明白你做的其他修改,但完整的代碼塊看起來很合理。) – mrry