2017-06-08 40 views
0

我正在開發一個DDPG實現,它需要計算一個網絡(下圖:critic)與另一個網絡(下圖:actor)輸出的梯度。我的代碼已經利用隊列,而不是飼料類型的字典大部分,但我不能爲這個特定部分這樣做還:Tensorflow:圖形不同路徑之間的tf.gradients

import tensorflow as tf 
tf.reset_default_graph() 

states = tf.placeholder(tf.float32, (None,)) 
actions = tf.placeholder(tf.float32, (None,)) 

actor = states * 1 
critic = states * 1 + actions 

grads_indirect = tf.gradients(critic, actions) 
grads_direct = tf.gradients(critic, actor) 

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 

    act = sess.run(actor, {states: [1.]}) 
    print(act) # -> [1.] 
    cri = sess.run(critic, {states: [1.], actions: [2.]}) 
    print(cri) # -> [3.] 
    grad1 = sess.run(grads_indirect, {states: [1.], actions: act}) 
    print(grad1) # -> [[1.]] 
    grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]}) 
    print(grad2) # -> TypeError: Fetch argument has invalid type 'NoneType' 

grad1這裏計算的梯度w.r.t.到之前由actor計算出的接收動作。 grad2應該做同樣的事情,但直接在圖表的內部,而不需要重新提供動作,而是通過直接評估actor。問題是,grads_directNone

print(grads_direct) # [None] 

我怎樣才能做到這一點?有沒有專門的「評估張量」操作,我可以利用?謝謝!

回答

1

在您的示例中,您不使用actor來計算critic,所以漸變爲無。

你應該這樣做:

actor = states * 1 
critic = actor + actions # change here 

grads_indirect = tf.gradients(critic, actions) 
grads_direct = tf.gradients(critic, actor) 
相關問題