2017-01-07 87 views
1

我試圖在我的RPi3再培訓in3V3。我收到了這個直方圖錯誤信息。Inception再培訓問題「Nan在總結直方圖爲:HistogramSummary」

python /home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py --bottleneck_dir=/home/pi/Documents/Machine\ Learning/Inception/tf_files/bottlenecks --how_many_training_steps 500 --model_dir=/home/pi/Documents/Machine\ Learning/Inception/tf_files/inception --output_graph=/home/pi/Documents/Machine\ Learning/Inception/tf_files/retrained_graph.pb --output_labels=/home/pi/Documents/Machine\ Learning/Inception/tf_files/retrained_labels.txt --image_dir /home/pi/Documents/Machine\ Learning/Inception/Retraining_Images 
Looking for images in 'Granny Smith Apple' 
Looking for images in 'Red Delicious' 
100 bottleneck files created. 
200 bottleneck files created. 
2017-01-07 11:30:22.180768: Step 0: Train accuracy = 56.0% 
2017-01-07 11:30:22.242166: Step 0: Cross entropy = nan 
2017-01-07 11:30:22.850969: Step 0: Validation accuracy = 50.0% 
Traceback (most recent call last): 
    File "/home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 938, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run 
    sys.exit(main(sys.argv[:1] + flags_passthrough)) 
    File "/home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 887, in main 
    ground_truth_input: train_ground_truth}) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 717, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 915, in _run 
    feed_dict_string, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _do_run 
    target_list, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 985, in _do_call 
    raise type(e)(node_def, op, message) 
tensorflow.python.framework.errors.InvalidArgumentError: Nan in summary histogram for: HistogramSummary 
    [[Node: HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](HistogramSummary/tag, final_result)]] 

Caused by op u'HistogramSummary', defined at: 
    File "/home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 938, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run 
    sys.exit(main(sys.argv[:1] + flags_passthrough)) 
    File "/home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 846, in main 
    bottleneck_tensor) 
    File "/home/pi/Tensorflow/tensorflow/tensorflow/examples/image_retraining/retrain.py", line 764, in add_final_training_ops 
    tf.histogram_summary(final_tensor_name + '/activations', final_tensor) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/logging_ops.py", line 100, in histogram_summary 
    tag=tag, values=values, name=scope) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 100, in _histogram_summary 
    name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2380, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1298, in __init__ 
    self._traceback = _extract_stack() 

InvalidArgumentError (see above for traceback): Nan in summary histogram for: HistogramSummary 
    [[Node: HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](HistogramSummary/tag, final_result)]] 

我試圖改變retrain.py閱讀merged = tf.merge_all_summaries()this 之後卻沒有工作。

而且,我第一次嘗試重新訓練,我打一個錯誤之前得到了一步0不同的結果:

2017-01-07 11:13:36.548913: Step 0: Train accuracy = 89.0% 
2017-01-07 11:13:36.555770: Step 0: Cross entropy = 0.590778 
2017-01-07 11:13:37.052190: Step 0: Validation accuracy = 76.0% 
+0

你能解決你的問題嗎?我有同樣的問題,找不到解決方案。 – Gegenwind

回答

3

聽起來,這可能有助於知道那裏的NaN值的來源。對於這一點,看看tensorflow調試器(tfdbg): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/debugger/index.md

在你retrain.py,你可以像

from tensorflow.python import debug as tf_debug 

# ... 
# In def main(_) 
if debug: 
    sess = tf_debug.LocalCLIDebugWrapperSession(sess) 
    sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) 

# ... 

然後改變時sess.run()發生的培訓和考覈,你將放入調試器的命令行界面。在tfdbg>提示,你可以輸入命令讓代碼運行,直到所有的NaN或無窮出現在TensorFlow圖:

tfdbg> run -f has_inf_or_nan 

當張過濾has_inf_or_nan被擊中,界面會給你包含張量清單Infs或Nans,按時間順序排序。最上面的那個應該是「罪魁禍首」,即首先產生不良數值的那個。要說它的名字是node_1,你可以使用下面的命令tfdbg看它的投入和節點屬性:

tfdbg> li -r node_1 
tfdbg> ni -a node_1 
+0

注意:由於tfdbg調試器是最近添加的功能(大約2016年12月),因此您可能需要同步TensorFlow分支才能訪問和/或下載最新的二進制文件以訪問此功能。 – scai

+0

是否有我需要添加在Python代碼中的標誌?儘管添加了「--debug」,但錯誤「if debug:'NameError:global name'debug'未定義'」。我正在使用TF.11。 – user7388993

+0

@ user7388993「調試」標誌僅僅是一個例子。您可以在Python代碼中將其定義爲常量。如果你不想要那個開關,你可以不用那個「如果調試」行。 – scai

1

如果您使用tf.contrib.learn你要使用以下命令:

debug_hook = tf_debug.LocalCLIDebugHook() 
debug_hook.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) 
hooks = [debug_hook] 
... 
classifier.fit(..., monitors=hooks)