RNN初始狀態是否爲後續小批量復位？

有人能否澄清在TF中RNN的初始狀態是否爲後續小批量復位，或者如Ilya Sutskever et al., ICLR 2015中提到的那樣使用了前一個小批量的最後一個狀態？RNN初始狀態是否爲後續小批量復位？

2016-07-18 VM_AI

tf.nn.dynamic_rnn()或tf.nn.rnn()操作允許使用initial_state參數指定RNN的初始狀態。如果不指定此參數，則隱藏狀態將在每個培訓批次開始時初始化爲零向量。

在TensorFlow中，您可以在tf.Variable()中包圍張量以在多次會話運行之間保持它們的值。只要確保將它們標記爲不可訓練是因爲優化器默認調整所有可訓練變量。

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) 

cell = tf.nn.rnn_cell.GRUCell(256) 
state = tf.Variable(cell.zero_states(batch_size, tf.float32), trainable=False) 
output, new_state = tf.nn.dynamic_rnn(cell, data, initial_state=state) 

with tf.control_dependencies([state.assign(new_state)]): 
    output = tf.identity(output) 

sess = tf.Session() 
sess.run(tf.initialize_all_variables()) 
sess.run(output, {data: ...})

我沒有測試過這個代碼，但它應該給你一個正確的方向提示。還有一個tf.nn.state_saving_rnn()，你可以提供一個狀態保存對象，但我還沒有使用它。

來源

2016-07-19 18:11:12 danijar

除了danijar的回答，這裏是一個LSTM的代碼，其狀態是一個元組（state_is_tuple=True）。它也支持多個圖層。

我們定義了兩個函數 - 一個用於獲取具有初始零狀態的狀態變量和一個用於返回操作的函數，我們可以將其傳遞到session.run以便用LSTM的最後隱藏狀態更新狀態變量。

def get_state_variables(batch_size, cell): 
    # For each layer, get the initial state and make a variable out of it 
    # to enable updating its value. 
    state_variables = [] 
    for state_c, state_h in cell.zero_state(batch_size, tf.float32): 
     state_variables.append(tf.contrib.rnn.LSTMStateTuple(
      tf.Variable(state_c, trainable=False), 
      tf.Variable(state_h, trainable=False))) 
    # Return as a tuple, so that it can be fed to dynamic_rnn as an initial state 
    return tuple(state_variables) 


def get_state_update_op(state_variables, new_states): 
    # Add an operation to update the train states with the last state tensors 
    update_ops = [] 
    for state_variable, new_state in zip(state_variables, new_states): 
     # Assign the new state to the state variables on this layer 
     update_ops.extend([state_variable[0].assign(new_state[0]), 
          state_variable[1].assign(new_state[1])]) 
    # Return a tuple in order to combine all update_ops into a single operation. 
    # The tuple's actual value should not be used. 
    return tf.tuple(update_ops)

類似danijar的回答中，我們可以使用每批後更新LSTM的狀態：

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) 
cell_layer = tf.contrib.rnn.GRUCell(256) 
cell = tf.contrib.rnn.MultiRNNCell([cell_layer] * num_layers) 

# For each layer, get the initial state. states will be a tuple of LSTMStateTuples. 
states = get_state_variables(batch_size, cell) 

# Unroll the LSTM 
outputs, new_states = tf.nn.dynamic_rnn(cell, data, initial_state=states) 

# Add an operation to update the train states with the last state tensors. 
update_op = get_state_update_op(states, new_states) 

sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 
sess.run([outputs, update_op], {data: ...})

的主要區別在於state_is_tuple=True使LSTM的狀態包含兩個變量LSTMStateTuple（細胞狀態和隱藏狀態）而不是僅僅一個變量。使用多層然後使LSTM的狀態成爲LSTMStateTuples的元組 - 每層一個。

來源

2016-12-20 10:20:44

請注意，您創建num_layers_identical_ cells的方式可能不是您想要做的事情 –

RNN初始狀態是否爲後續小批量復位？

回答

相關問題