1
所有,在Tensorflow中使用bucketing時如何在Adam優化器中共享梯度和變量?
我用了瓢潑大雨般的技術seq2seq任務:
# For different length in encoder and decoder
model_map = {}
for i in encoder_shape:
for j in decoder_shape:
with variable_scope.variable_scope(variable_scope.get_variable_scope(),
reuse=True if tt > 0 else None):
model = Seq2SeqModel()
model.build(encoder[:i], decoder[:j])
model_map[i*100+j] = model
與人分享模型的參數:
for t in tf.all_variables():
print t.name, t.get_shape()
Print:
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding:0 (50000, 256)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix:0 (1056, 1600)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias:0 (1600,)
模型的優化是象下面這樣:
#every model have an optimizer
params = tf.trainable_variables()
opt = tf.train.AdamOptimizer(1e-3)
gradients = tf.gradients(self.loss, params)
self.optimizer = opt.apply_gradients(zip(gradients, params))
但我發現優化器不共享變量:
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam:0 (50000, 256)
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_1:0 (50000, 256)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam:0 (1056, 1600)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_1:0 (1056, 1600)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam:0 (1600,)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_1:0 (1600,)
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_2:0 (50000, 256)
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_3:0 (50000, 256)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_2:0 (1056, 1600)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_3:0 (1056, 1600)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_2:0 (1600,)
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_3:0 (1600,)
隨着存儲桶數量的增加,GPU內存也將增長。與此同時,我在tf.train.Saver.save()中獲得了一個更大的模型。
那麼是否有可能在張量流分流中共享梯度?
@Lukasz Kaiser請給予一些幫助。 – Issac