0
我正在開發一個DDPG實現,它需要計算一個網絡(下圖:critic
)與另一個網絡(下圖:actor
)輸出的梯度。我的代碼已經利用隊列,而不是飼料類型的字典大部分,但我不能爲這個特定部分這樣做還:Tensorflow:圖形不同路徑之間的tf.gradients
import tensorflow as tf
tf.reset_default_graph()
states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))
actor = states * 1
critic = states * 1 + actions
grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
act = sess.run(actor, {states: [1.]})
print(act) # -> [1.]
cri = sess.run(critic, {states: [1.], actions: [2.]})
print(cri) # -> [3.]
grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
print(grad1) # -> [[1.]]
grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
print(grad2) # -> TypeError: Fetch argument has invalid type 'NoneType'
grad1
這裏計算的梯度w.r.t.到之前由actor
計算出的接收動作。 grad2
應該做同樣的事情,但直接在圖表的內部,而不需要重新提供動作,而是通過直接評估actor
。問題是,grads_direct
爲None
:
print(grads_direct) # [None]
我怎樣才能做到這一點?有沒有專門的「評估張量」操作,我可以利用?謝謝!