張量流r1.0中注意力解碼器的實現讓我感到困惑。原始代碼可以在這裏找到:https://github.com/tensorflow/tensorflow/blob/r1.0/tensorflow/contrib/seq2seq/python/ops/attention_decoder_fn.py。張量流注意力解碼器的實現r1.0
這裏是我感到困惑的代碼部分:
def decoder_fn(time, cell_state, cell_input, cell_output, context_state):
if cell_state is None: # first call, return encoder_state
cell_state = encoder_state
# init attention
attention = _init_attention(encoder_state)
else:
# construct attention
attention = attention_construct_fn(cell_output, attention_keys,
attention_values)
# in the doc, they said they won't change the cell_output
cell_output = attention
# combine cell_input and attention
next_input = array_ops.concat([cell_input, attention], 1)
return (None, cell_state, next_input, cell_output, context_state)
在我的理解中,解碼器從上次步驟接收狀態,併產生一個隱藏的狀態和輸出。基於RNN的先前隱藏狀態和當前輸入來創建注意。我們最終的解碼器輸出是RNN在每個時間步驟產生的所有輸出。
然而,看起來在張量流中解碼器將注意力作爲輸出返回,並且在每個時間步中它使用RNN的輸出作爲輸入來計算注意力。
tensorflow中的實現是否錯誤?但實際上,這種實現(在張量流中)表現更好。
謝謝!