Tensorflow：如何替換或修改漸變？

我想用張量流來替換或修改一個op或部分圖的梯度。如果我可以在計算中使用現有的漸變，那將是理想的。Tensorflow：如何替換或修改漸變？

在某些方面，這與tf.stop_gradient()所做的相反：不是在計算梯度時添加被忽略的計算，而是需要一個僅在計算梯度時使用的計算。

一個簡單的例子就是通過將梯度乘以常數來簡單地縮放梯度（但不會將正向計算乘以常數）。另一個例子是將漸變剪輯到給定範圍的東西。

2017-05-08 Alex I

首先定義自定義梯度：

@tf.RegisterGradient("CustomGrad") 
def _const_mul_grad(unused_op, grad): 
    return 5.0 * grad

既然你什麼都不想要在直傳發生的，跟你的新梯度覆蓋的身份運行的梯度：

g = tf.get_default_graph() 
with g.gradient_override_map({"Identity": "CustomGrad"}): 
    output = tf.identity(input, name="Identity")

這是一個帶層工作的例子是，在向後夾梯度通和不執行任何在向前傳遞，用同樣的方法，包括：

import tensorflow as tf 

@tf.RegisterGradient("CustomClipGrad") 
def _clip_grad(unused_op, grad): 
    return tf.clip_by_value(grad, -0.1, 0.1) 

input = tf.Variable([3.0], dtype=tf.float32) 

g = tf.get_default_graph() 
with g.gradient_override_map({"Identity": "CustomClipGrad"}): 
    output_clip = tf.identity(input, name="Identity") 
grad_clip = tf.gradients(output_clip, input) 

# output without gradient clipping in the backwards pass for comparison: 
output = tf.identity(input) 
grad = tf.gradients(output, input) 

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    print("with clipping:", sess.run(grad_clip)[0]) 
    print("without clipping:", sess.run(grad)[0])

來源

2017-05-13 03:18:11 BlueSun

這是否會修改鏈中的漸變漸變或不變？ –

@KevinP例如，對於剪輯：在身份操作的後向傳遞期間，梯度將僅剪裁1次。但是鏈中的所有圖層都會受到影響，因爲每個圖層都會使用其後續圖層的漸變。但是之前的層次 - 自己不會再次剪輯。 – BlueSun

謝謝。整個backprop與前鋒使問題比預期更混亂。我後來在backprop梯度鏈中表示意思。 –

使用optimizer.compute_gradients或tf.gradient獲得原始梯度
然後做你想做的
最後，使用optimizer.apply_gradients

我發現了一個example從GitHub

來源

2017-05-09 03:27:05 xxi

謝謝你，這是有趣的。我認爲它取代了完整的（端到端）漸變，並且僅用於優化器。我想替換單個操作的漸變，同時讓其他操作的漸變通過它們通常的方式傳播;我不一定知道該如何處理端到端的漸變。一個例子是有一個tf。matmult（），其中正向計算正常完成，但梯度爲clip（grad，min，max），其中grad是原始梯度，並將其用於較大的圖形中。 –

看一下[compute_gradients]（https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer#compute_gradients），它會返回一個'（gradient，variable）'對的列表，所以我認爲你可以只修改你想要的「漸變」，比如[this]（https://github.com/KelvinLu/krotos-convnet/blob/e37218aeaf10b73d77dfac911be46d8ab689e41d/krotos/convnet/model/training.py#L27），找到'var'你想要 – xxi

做到這一點的最一般的方法是使用 https://www.tensorflow.org/api_docs/python/tf/RegisterGradient

下面我實現了反向傳播的漸變剪裁，它可以用於matmul，如這裏所示，或者任何其它運算：

import tensorflow as tf 
import numpy as np 

# from https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342 
def py_func(func, inp, Tout, stateful=True, name=None, grad=None): 

    # Need to generate a unique name to avoid duplicates: 
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8)) 

    tf.RegisterGradient(rnd_name)(grad) 
    g = tf.get_default_graph() 
    with g.gradient_override_map({"PyFunc": rnd_name}): 
     return tf.py_func(func, inp, Tout, stateful=stateful, name=name) 

def clip_grad(x, clip_value, name=None): 
    """" 
    scales backpropagated gradient so that 
    its L2 norm is no more than `clip_value` 
    """ 
    with tf.name_scope(name, "ClipGrad", [x]) as name: 
     return py_func(lambda x : x, 
         [x], 
         [tf.float32], 
         name=name, 
         grad=lambda op, g : tf.clip_by_norm(g, clip_value))[0]

實例：

with tf.Session() as sess: 
    x = tf.constant([[1., 2.], [3., 4.]]) 
    y = tf.constant([[1., 2.], [3., 4.]]) 

    print('without clipping') 
    z = tf.matmul(x, y) 
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval()) 

    print('with clipping') 
    z = tf.matmul(clip_grad(x, 1.0), clip_grad(y, 0.5)) 
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval()) 

    print('with clipping between matmuls') 
    z = tf.matmul(clip_grad(tf.matmul(x, y), 1.0), y) 
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

輸出：

without clipping 
[[ 3. 7.] 
[ 3. 7.]] 
with clipping 
[[ 0.278543 0.6499337] 
[ 0.278543 0.6499337]] 
with clipping between matmuls 
[[ 1.57841039 3.43536377] 
[ 1.57841039 3.43536377]]

來源

2017-05-12 06:19:21 MaxB

MaxB：謝謝！這看起來很有用。我不知道如何在Python中定義一個新的操作通過......它只是一個裝飾器的函數？你可以做一個matmult與剪切漸變完整的例子？ –

@AlexI這不容易，但它是可行的：http://stackoverflow.com/questions/37924071/tensorflow-writing-an-op-in-python如果你只是想剪輯漸變，我建議你定義一個「身份op「，除了剪切漸變之外別無其他。另請參閱https://www.tensorflow.org/extend/adding_an_op#implement_the_gradient_in_python – MaxB

@AlexI我實現了實際的反向傳播漸變裁剪。請參閱編輯 – MaxB

假設向前計算是

y = f(x)

而且你希望它像backpropagate

y = b(x)

一個簡單的黑客將是：

y = b(x) + tf.stop_gradient(f(x) - b(x))

來源

2017-05-13 10:43:50 Bily

它應該是tf.stop_gradient，已修改。 @lvelin – Bily

Tensorflow：如何替換或修改漸變？

回答

相關問題