Tensorflow：不要更新如果漸變是南

我有一個深的模型來訓練CIFAR-10。培訓可以在CPU中正常工作。但是，當我使用GPU支持時，它會導致某些批次的梯度變爲NaN（我使用tf.check_numerics進行了檢查），並且它隨機發生，但足夠早。我相信這個問題與我的GPU有關。Tensorflow：不要更新如果漸變是南

我的問題是：如果至少有一個梯度具有NaN並強制模型進入下一批，那麼是否有更新？

編輯：或許我應該詳細說明我的問題。

這是我如何申請梯度：

with tf.control_dependencies([tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' % var.name) for grad, var in grads]): 
# Apply the gradients to adjust the shared variables. 
    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)

我曾經想過用tf.check_numerics首先要驗證有提示NaN的梯度，並且，然後，如果有NaN的（檢查失敗）我可以「通過」而不使用opt.apply_gradients。但是，有沒有辦法在tf.control_dependencies上發現錯誤？

來源

2017-09-13 D.Badawi

我可以弄明白，雖然不是最優雅的方式。我的解決方案如下： 1）首先檢查所有梯度 2）如果梯度不含NaNs，則應用它們3）否則，應用僞更新（使用零值），這需要漸變覆蓋。

這是我的代碼：

首先定義自定義梯度：

@tf.RegisterGradient("ZeroGrad") 
def _zero_grad(unused_op, grad): 
    return tf.zeros_like(grad)

然後定義異常處理功能：

#this is added for gradient check of NaNs 
def check_numerics_with_exception(grad, var): 
    try: 
    tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' % var.name) 
    except: 
    return tf.constant(False, shape=()) 
    else: 
    return tf.constant(True, shape=())

然後創造條件節點：

num_nans_grads = tf.Variable(1.0, name='num_nans_grads') 
check_all_numeric_op = tf.reduce_sum(tf.cast(tf.stack([tf.logical_not(check_numerics_with_exception(grad, var)) for grad, var in grads]), dtype=tf.float32)) 

with tf.control_dependencies([tf.assign(num_nans_grads, check_all_numeric_op)]): 
# Apply the gradients to adjust the shared variables. 
    def fn_true_apply_grad(grads, global_step): 
    apply_gradients_true = opt.apply_gradients(grads, global_step=global_step) 
    return apply_gradients_true 

    def fn_false_ignore_grad(grads, global_step): 
    #print('batch update ignored due to nans, fake update is applied') 
    g = tf.get_default_graph() 
    with g.gradient_override_map({"Identity": "ZeroGrad"}): 
    for (grad, var) in grads: 
     tf.assign(var, tf.identity(var, name="Identity")) 
     apply_gradients_false = opt.apply_gradients(grads, global_step=global_step) 
    return apply_gradients_false 

    apply_gradient_op = tf.cond(tf.equal(num_nans_grads, 0.), lambda : fn_true_apply_grad(grads, global_step), lambda : fn_false_ignore_grad(grads, global_step))

來源

2017-09-13 17:06:19

Tensorflow：不要更新如果漸變是南

回答

相關問題