Tensorflow模型恢復（恢復訓練似乎從頭開始）

保存我的模型後，我有恢復訓練的問題。問題是我的損失減少例如從6到3。此時我保存模型。當我恢復它並繼續訓練時，損失從6重新開始。恢復似乎不起作用。我不明白，因爲打印重量，似乎他們正確加載。我使用ADAM優化器。提前致謝。這裏：Tensorflow模型恢復（恢復訓練似乎從頭開始）

batch_size = self.batch_size 
    num_classes = self.num_classes 

    n_hidden = 50 #700 
    n_layers = 1 #3 
    truncated_backprop = self.seq_len 
    dropout = 0.3 
    learning_rate = 0.001 
    epochs = 200 

    with tf.name_scope('input'): 
     x = tf.placeholder(tf.float32, [batch_size, truncated_backprop], name='x') 
     y = tf.placeholder(tf.int32, [batch_size, truncated_backprop], name='y') 

    with tf.name_scope('weights'): 
     W = tf.Variable(np.random.rand(n_hidden, num_classes), dtype=tf.float32) 
     b = tf.Variable(np.random.rand(1, num_classes), dtype=tf.float32) 

    inputs_series = tf.split(x, truncated_backprop, 1) 
    labels_series = tf.unstack(y, axis=1) 

    with tf.name_scope('LSTM'): 
     cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, state_is_tuple=True) 
     cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout) 
     cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers) 

    states_series, current_state = tf.contrib.rnn.static_rnn(cell, inputs_series, \ 
     dtype=tf.float32) 

    logits_series = [tf.matmul(state, W) + b for state in states_series] 
    prediction_series = [tf.nn.softmax(logits) for logits in logits_series] 

    losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) \ 
     for logits, labels, in zip(logits_series, labels_series)] 
    total_loss = tf.reduce_mean(losses) 

    train_step = tf.train.AdamOptimizer(learning_rate).minimize(total_loss) 

    tf.summary.scalar('total_loss', total_loss) 
    summary_op = tf.summary.merge_all() 

    loss_list = [] 
    writer = tf.summary.FileWriter('tf_logs', graph=tf.get_default_graph()) 

    all_saver = tf.train.Saver() 

    with tf.Session() as sess: 
     #sess.run(tf.global_variables_initializer()) 
     tf.reset_default_graph() 
     saver = tf.train.import_meta_graph('./models/tf_models/rnn_model.meta') 
     saver.restore(sess, './models/tf_models/rnn_model') 

     for epoch_idx in range(epochs): 
      xx, yy = next(self.get_batch) 
      batch_count = len(self.D.chars) // batch_size // truncated_backprop 

      for batch_idx in range(batch_count): 
       batchX, batchY = next(self.get_batch) 

       summ, _total_loss, _train_step, _current_state, _prediction_series = sess.run(\ 
        [summary_op, total_loss, train_step, current_state, prediction_series], 
        feed_dict = { 
         x : batchX, 
         y : batchY 
        }) 

       loss_list.append(_total_loss) 
       writer.add_summary(summ, epoch_idx * batch_count + batch_idx) 
       if batch_idx % 5 == 0: 
        print('Step', batch_idx, 'Batch_loss', _total_loss) 

       if batch_idx % 50 == 0: 
        all_saver.save(sess, 'models/tf_models/rnn_model') 

      if epoch_idx % 5 == 0: 
       print('Epoch', epoch_idx, 'Last_loss', loss_list[-1])

來源

2017-04-12 JimZer

那麼，權重是否得到適當的恢復，但數據呢？它是一樣的嗎？ –

@DanevskyiDmytro我的數據分批進來。批次的檢索順序是隨機的，但對於所有數據集（整個時期），損失接近3。所以我希望當我恢復損失將從任何批次的3附近重新啓動？ – JimZer

你可以限制你的數據集到幾個批次，並對它們進行訓練和測試嗎？ –

我的問題是標籤中的代碼錯誤，它們在兩次運行之間發生了變化。所以它現在有效。謝謝你的幫助

來源

2017-04-13 11:16:54 JimZer

我有同樣的問題，在我的情況下，該模型被正確地恢復，但失去一次又一次地開始真的很高，問題是，我的批處理retreival是不是隨機的。我有三個類，A，B和C.我的數據是以這種方式餵養A，然後是B，然後是C.我不知道這是否是你的問題，但你必須確保你給你模型的每一批都有您的所有課程，所以在您的情況下，該批次必須有batch_size/num_classes每個課程的輸入。我改變了它，一切工作完美:)

看看你是否正確餵養你的模型。

來源

2017-04-12 14:00:49

謝謝你的提示，但是我的批次在每個時代以隨機順序加載...... – JimZer

Tensorflow模型恢復（恢復訓練似乎從頭開始）

回答

相關問題