我已經寫了我自己的代碼,參考this奇妙的教程,並且當我按照我在課堂上的理解使用注意力搜索時,我無法得到結果AttentionModel _build_decoder_cell函數創建單獨的解碼器細胞和推理模式注意包裝,假設這(我認爲這是不正確的,不能找到辦法解決它),在張量流中實現注意束搜索
with tf.name_scope("Decoder"):
mem_units = 2*dim
dec_cell = tf.contrib.rnn.BasicLSTMCell(2*dim)
beam_cel = tf.contrib.rnn.BasicLSTMCell(2*dim)
beam_width = 3
out_layer = Dense(output_vocab_size)
with tf.name_scope("Training"):
attn_mech = tf.contrib.seq2seq.BahdanauAttention(num_units = mem_units, memory = enc_rnn_out, normalize=True)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell = dec_cell,attention_mechanism = attn_mech)
batch_size = tf.shape(enc_rnn_out)[0]
initial_state = attn_cell.zero_state(batch_size = batch_size , dtype=tf.float32)
initial_state = initial_state.clone(cell_state = enc_rnn_state)
helper = tf.contrib.seq2seq.TrainingHelper(inputs = emb_x_y , sequence_length = seq_len)
decoder = tf.contrib.seq2seq.BasicDecoder(cell = attn_cell, helper = helper, initial_state = initial_state ,output_layer=out_layer)
outputs, final_state, final_sequence_lengths= tf.contrib.seq2seq.dynamic_decode(decoder=decoder,impute_finished=True)
training_logits = tf.identity(outputs.rnn_output)
training_pred = tf.identity(outputs.sample_id)
with tf.name_scope("Inference"):
enc_rnn_out_beam = tf.contrib.seq2seq.tile_batch(enc_rnn_out , beam_width)
seq_len_beam = tf.contrib.seq2seq.tile_batch(seq_len , beam_width)
enc_rnn_state_beam = tf.contrib.seq2seq.tile_batch(enc_rnn_state , beam_width)
batch_size_beam = tf.shape(enc_rnn_out_beam)[0] # now batch size is beam_width times
# start tokens mean be the original batch size so divide
start_tokens = tf.tile(tf.constant([27], dtype=tf.int32), [ batch_size_beam//beam_width ])
end_token = 0
attn_mech_beam = tf.contrib.seq2seq.BahdanauAttention(num_units = mem_units, memory = enc_rnn_out_beam, normalize=True)
cell_beam = tf.contrib.seq2seq.AttentionWrapper(cell=beam_cel,attention_mechanism=attn_mech_beam,attention_layer_size=mem_units)
initial_state_beam = cell_beam.zero_state(batch_size=batch_size_beam,dtype=tf.float32).clone(cell_state=enc_rnn_state_beam)
my_decoder = tf.contrib.seq2seq.BeamSearchDecoder(cell = cell_beam,
embedding = emb_out,
start_tokens = start_tokens,
end_token = end_token,
initial_state = initial_state_beam,
beam_width = beam_width
,output_layer=out_layer)
beam_output, t1 , t2 = tf.contrib.seq2seq.dynamic_decode( my_decoder,
maximum_iterations=maxlen)
beam_logits = tf.no_op()
beam_sample_id = beam_output.predicted_ids
當我訓練結束後撥打梁_sample_id我沒有得到正確的結果。
我的猜測是我們應該使用相同的注意力包裝,但這是不可能的,因爲我們必須使用tile_sequence來使用波束搜索。
任何見解/建議將不勝感激。
我也創造了他們的主要信息庫這個問題Issue-93
是的我沒有能夠使用我的方法在訓練過程中學到的權重。 tf.name_scope()在版本1.3中沒有參數「reuse」,你必須是tf.variable_scope()。 我通過在@dnnavn在[github問題](https://github.com/tensorflow/nmt/issues/93)中指出我創建了兩個單獨的訓練和推理圖來解決這個問題,他聲稱它只能通過單獨的圖表,我需要嘗試一下。同時,如果你已經成功地嘗試了它,請做評論。謝謝 –
是的,tf.variable_scope代替tf。name_scope –
嗨同樣,我可以看到您已將此答案標記爲正確,您是否有更改對數據進行測試? –