2016-02-02 105 views
4

我剛開始工作在tensorflow不久前。我正在研究seq2seq模型,並且不知何故使得教程能夠工作,但我堅持要獲取每個句子的狀態。Tensorflow seq2seq獲取序列隱藏狀態

據我所知,seq2seq模型採用輸入序列並通過RNN爲序列生成隱藏狀態。稍後,模型使用序列的隱藏狀態來生成新的數據序列。

我的問題是我應該怎麼做,如果我想直接使用輸入序列的隱藏狀態?比如說,如果我有一個訓練有素的模型,我應該如何獲得輸入序列[token1,token2,...,token N]的最終隱藏狀態?

我一直堅持在這2天,我嘗試了許多不同的方法,但沒有一個工作。

回答

1

在seq2seq模型中,編碼器始終是一個RNN,通過rnn.rnn調用。

到rnn.rnn回報輸出和狀態,所以得到的只是你能做到這一點的狀態中呼籲:

_,encoder_state = rnn.rnn(encoder_cell,encoder_inputs,D類D類=)

它在seq2seq模塊中以相同的方式完成。 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L103

+0

謝謝你的回覆。我確實發現了這一行代碼,問題是我不知道如何調用它。以tensorflow中的翻譯爲例,我們首先構建一個名爲model的seq2seqmodel類,並運行'model.step()'來訓練seq2seq。如果我不明白它是錯誤的,它通過https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/translate/seq2seq_model.py#L149調用,但是從這裏我被stucked – bearsteak

1

好吧,我想我的問題是我真的不知道如何以tensorflow風格編碼,所以我有種野蠻人強迫它。

(*表示,其中修改)

在蟒/操作/ seq2seq,修改model_with_buckets()

outputs = [] 
*states = [] 
    with ops.op_scope(all_inputs, name, "model_with_buckets"): 
    for j in xrange(len(buckets)): 
     if j > 0: 
     vs.get_variable_scope().reuse_variables() 
     bucket_encoder_inputs = [encoder_inputs[i] 
           for i in xrange(buckets[j][0])] 
     bucket_decoder_inputs = [decoder_inputs[i] 
           for i in xrange(buckets[j][1])] 
     *bucket_outputs, _ ,bucket_states= seq2seq(bucket_encoder_inputs, 
            bucket_decoder_inputs) 
     outputs.append(bucket_outputs) 
     states.append(bucket_states) 
     bucket_targets = [targets[i] for i in xrange(buckets[j][1])] 
     bucket_weights = [weights[i] for i in xrange(buckets[j][1])] 
     losses.append(sequence_loss(
      outputs[-1], bucket_targets, bucket_weights, num_decoder_symbols, 
      softmax_loss_function=softmax_loss_function)) 

    return outputs, losses,*states 

在蟒/操作/ seq2seq,修改embedding_attention_seq2seq()

if isinstance(feed_previous, bool): 
    * outputs, states = embedding_attention_decoder(
      decoder_inputs, encoder_states[-1], attention_states, cell, 
      num_decoder_symbols, num_heads, output_size, output_projection, 
      feed_previous) 
     * return outputs, states, tf.constant(encoder_states[-1]) 
    else: # If feed_previous is a Tensor, we construct 2 graphs and use cond. 
     outputs1, states1 = embedding_attention_decoder(
      decoder_inputs, encoder_states[-1], attention_states, cell, 
      num_decoder_symbols, num_heads, output_size, output_projection, True) 
     vs.get_variable_scope().reuse_variables() 
     outputs2, states2 = embedding_attention_decoder(
      decoder_inputs, encoder_states[-1], attention_states, cell, 
      num_decoder_symbols, num_heads, output_size, output_projection, False) 

     outputs = control_flow_ops.cond(feed_previous, 
             lambda: outputs1, lambda: outputs2) 
     states = control_flow_ops.cond(feed_previous, 
            lambda: states1, lambda: states2) 

     *return outputs, states, tf.constant(encoder_states[-1]) 

at model/rnn/translate/seq2seq_model.py修改init()

if forward_only: 
    * self.outputs, self.losses, self.states = seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets, 
      self.target_weights, buckets, self.target_vocab_size, 
      lambda x, y: seq2seq_f(x, y, True), 
      softmax_loss_function=softmax_loss_function) 
     # If we use output projection, we need to project outputs for decoding. 
     if output_projection is not None: 
     for b in xrange(len(buckets)): 
      self.outputs[b] = [tf.nn.xw_plus_b(output, output_projection[0], 
              output_projection[1]) 
          for output in self.outputs[b]] 
    else: 
    * self.outputs, self.losses,_ = seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets, 
      self.target_weights, buckets, self.target_vocab_size, 
      lambda x, y: seq2seq_f(x, y, False), 
      softmax_loss_function=softmax_loss_function) 

在模型/ RNN /翻譯/ seq2seq_model.py修改步驟()

if not forward_only: 
     return outputs[1], outputs[2], None # Gradient norm, loss, no outputs. 
else: 
     *return None, outputs[0], outputs[1:-1], outputs[-1] 

所有這些完成後,我們可以得到的編碼狀態致電:

_, _, _,states = model.step(all_other_arguements, forward_only = True)

+0

謝謝爲你的答案,但你的變化是基於0.6過時了。你有最新的0.11或更新版本的相應變化? – ccy

1

bearsteak的上面的答案很好,但它是基於tensorflow-0.6,這是相當過時的。所以我在tensorflow-0.8中更新他的答案,這與最新版本中的類似。

(*表示,其中修改)

losses = [] 
outputs = [] 
*states = [] 
with ops.op_scope(all_inputs, name, "model_with_buckets"): 
    for j, bucket in enumerate(buckets): 
     with variable_scope.variable_scope(variable_scope.get_variable_scope(), 
                      reuse=True if j > 0 else None): 
      *bucket_outputs, _ ,bucket_states= seq2seq(encoder_inputs[:bucket[0]], 
                    decoder_inputs[:bucket[1]]) 
      outputs.append(bucket_outputs) 
      if per_example_loss: 
       losses.append(sequence_loss_by_example(
         outputs[-1], targets[:bucket[1]], weights[:bucket[1]], 
         softmax_loss_function=softmax_loss_function)) 
      else: 
       losses.append(sequence_loss(
        outputs[-1], targets[:bucket[1]], weights[:bucket[1]], 
        softmax_loss_function=softmax_loss_function)) 

return outputs, losses, *states 

在蟒/操作/ seq2seq,修改embedding_attention_seq2seq()

if isinstance(feed_previous, bool): 
    *outputs, states = embedding_attention_decoder(
       decoder_inputs, encoder_state, attention_states, cell, 
       num_decoder_symbols, embedding_size, num_heads=num_heads, 
       output_size=output_size, output_projection=output_projection, 
       feed_previous=feed_previous, 
       initial_state_attention=initial_state_attention) 
    *return outputs, states, encoder_state 

    # If feed_previous is a Tensor, we construct 2 graphs and use cond. 
def decoder(feed_previous_bool): 
    reuse = None if feed_previous_bool else True 
    with variable_scope.variable_scope(variable_scope.get_variable_scope(),reuse=reuse): 
     outputs, state = embedding_attention_decoder(
       decoder_inputs, encoder_state, attention_states, cell, 
       num_decoder_symbols, embedding_size, num_heads=num_heads, 
       output_size=output_size, output_projection=output_projection, 
       feed_previous=feed_previous_bool, 
       update_embedding_for_previous=False, 
       initial_state_attention=initial_state_attention) 
     return outputs + [state] 

    outputs_and_state = control_flow_ops.cond(feed_previous, lambda: decoder(True), lambda: decoder(False))                                       
    *return outputs_and_state[:-1], outputs_and_state[-1], encoder_state 

在模型/ RNN /翻譯/ seq2seq_model。PY修改的init()

if forward_only: 
    *self.outputs, self.losses, self.states= tf.nn.seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets, 
      self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True), 
      softmax_loss_function=softmax_loss_function) 
    # If we use output projection, we need to project outputs for decoding. 
    if output_projection is not None: 
     for b in xrange(len(buckets)): 
      self.outputs[b] = [ 
        tf.matmul(output, output_projection[0]) + output_projection[1] 
        for output in self.outputs[b] 
      ] 
else: 
    *self.outputs, self.losses, _ = tf.nn.seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets, 
      self.target_weights, buckets, 
      lambda x, y: seq2seq_f(x, y, False), 
      softmax_loss_function=softmax_loss_function) 

在模型/ RNN /翻譯/ seq2seq_model.py修改步驟()

if not forward_only: 
    return outputs[1], outputs[2], None # Gradient norm, loss, no outputs. 
else: 
    *return None, outputs[0], outputs[1:], outputs[-1] # No gradient norm, loss, outputs. 

所有這些完成後,我們可以通過調用得到的編碼狀態:

_, _, output_logits, states = model.step(sess, encoder_inputs, decoder_inputs, 
                    target_weights, bucket_id, True) 
print (states) 

in translate.py。