TensorFlow LSTM：關於批處理和狀態的混淆

看看官方的TensorFlow RNN tutorial和full code，我對數據如何在時代中分配有點困惑。首先，我不明白在run_epoch()函數中使用狀態變量。在main()功能，在遍歷時期，我們稱之爲：TensorFlow LSTM：關於批處理和狀態的混淆

def run_epoch(session, model, eval_op=None, verbose=False): 
    """Runs the model on the given data.""" 
    start_time = time.time() 
    costs = 0.0 
    iters = 0 
    state = session.run(model.initial_state) 

    fetches = { 
     "cost": model.cost, 
     "final_state": model.final_state, 
    } 
    if eval_op is not None: 
    fetches["eval_op"] = eval_op 

    for step in range(model.input.epoch_size): 
    feed_dict = {} 
    for i, (c, h) in enumerate(model.initial_state): 
     feed_dict[c] = state[i].c 
     feed_dict[h] = state[i].h 

    vals = session.run(fetches, feed_dict) 
    cost = vals["cost"] 
    state = vals["final_state"] 

    costs += cost 
    iters += model.input.num_steps 

    if verbose and step % (model.input.epoch_size // 10) == 10: 
     print("%.3f perplexity: %.3f speed: %.0f wps" % 
      (step * 1.0/model.input.epoch_size, np.exp(costs/iters), 
      iters * model.input.batch_size/(time.time() - start_time))) 

    return np.exp(costs/iters)

什麼是state變量，爲什麼我們列舉，並在每一步重寫model.initial_state？

此外，望着reader.py文件，以下會將數據：

def ptb_producer(raw_data, batch_size, num_steps, name=None): 
    """Iterate on the raw PTB data. 

    This chunks up raw_data into batches of examples and returns Tensors that 
    are drawn from these batches. 

    Args: 
    raw_data: one of the raw data outputs from ptb_raw_data. 
    batch_size: int, the batch size. 
    num_steps: int, the number of unrolls. 
    name: the name of this operation (optional). 

    Returns: 
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element 
    of the tuple is the same data time-shifted to the right by one. 

    Raises: 
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high. 
    """ 
    with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]): 
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32) 

    data_len = tf.size(raw_data) 
    batch_len = data_len // batch_size 
    data = tf.reshape(raw_data[0 : batch_size * batch_len], 
         [batch_size, batch_len]) 

    epoch_size = (batch_len - 1) // num_steps 
    assertion = tf.assert_positive(
     epoch_size, 
     message="epoch_size == 0, decrease batch_size or num_steps") 
    with tf.control_dependencies([assertion]): 
     epoch_size = tf.identity(epoch_size, name="epoch_size") 

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue() 
    x = tf.slice(data, [0, i * num_steps], [batch_size, num_steps]) 
    y = tf.slice(data, [0, i * num_steps + 1], [batch_size, num_steps]) 
    return x, y

爲什麼我們分開分批和步驟的數據和兩者混合使用？這有點令人困惑。爲什麼不僅僅是批量迭代或者僅僅是遍歷整個數據集呢？

來源

2016-11-11 vega

兩者完全不同。 num_steps參數控制發生反向傳播之後的輸入數量。分批分割數據通過一次處理大量數據來實現高效實施。如果直接查看批量輸入，這有點令人困惑。將batch_size設置爲1並檢查輸入的外觀。這就像使用num_steps一樣。培訓RNN需要更長的時間。

來源

2017-02-10 23:06:42

TensorFlow LSTM：關於批處理和狀態的混淆

回答

相關問題