2017-06-22 87 views
1

我正在訓練tensorflow.contrib.seq2seq編碼器 - 解碼器模型,每個小批次的訓練時間單調遞增。每個小批次的TensorFlow Seq2Seq訓練時間單調遞增

Step Number: 10 Elapsed time: 52.89215302467346 Loss: 1.0420862436294556 Metrics: {'accuracy': 0.22499999} Step Number: 20 Elapsed time: 60.28505992889404 Loss: 0.8007364869117737 Metrics: {'accuracy': 0.28} Step Number: 30 Elapsed time: 73.98479580879211 Loss: 0.7292348742485046 Metrics: {'accuracy': 0.34} Step Number: 40 Elapsed time: 82.99069213867188 Loss: 0.6843382120132446 Metrics: {'accuracy': 0.345} Step Number: 50 Elapsed time: 86.97363901138306 Loss: 0.6808319687843323 Metrics: {'accuracy': 0.38999999} Step Number: 60 Elapsed time: 106.96697807312012 Loss: 0.601255476474762 Metrics: {'accuracy': 0.44} Step Number: 70 Elapsed time: 124.17725801467896 Loss: 0.5971778035163879 Metrics: {'accuracy': 0.405} Step Number: 80 Elapsed time: 137.91252613067627 Loss: 0.596596896648407 Metrics: {'accuracy': 0.43000001} Step Number: 90 Elapsed time: 146.6834409236908 Loss: 0.5921837687492371 Metrics: {'accuracy': 0.42500001}

我所有的數據被人爲地產生和隨機抽樣,這意味着(一般)應該在後面的訓練在訓練初期minibatches和minibatches之間沒有什麼區別。另外,我所有的數據都有相同的輸入序列長度和相同的輸出序列長度。爲什麼我的模型需要更長的時間來訓練稍後的迷你貼紙?

我發現這個相關post,但我沒有改變我的訓練循環中的計算圖。

表現出一定的代碼,讓我們在main開始:

def main(_): 
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline() 

    model = import_model() 

    train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch) 

```

我的數據存儲爲SequenceExample S,每TFRecord文件之一。我construct_data_pipeline()函數的定義如下:

def construct_data_pipeline(): 
    # extract TFRecord filenames located in data directory 
    tfrecord_filenames = [] 
    for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir): 
     for filename in filenames: 
      if filename.endswith('.tfrecord'): 
       tfrecord_filenames.append(os.path.join(dirpath, filename)) 

    # read and parse data from TFRecords into tensors 
    x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames) 

    # group tensors into minibatches 
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y, 
                     y_len=y_len, 
                     x_len=x_len) 

    return x_minibatch, y_minibatch, y_lengths_minibatch 

步入construct_examples_queue()

def construct_examples_queue(tfrecords_filenames): 
    number_of_readers = tf.flags.FLAGS.number_of_readers 

    with tf.name_scope('examples_queue'): 
     key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames, 
                      tf.TFRecordReader, 
                      num_readers=number_of_readers) 

     x, y, x_len, y_len = parse_example(example_serialized) 

     return x, y, x_len, y_len 

我不認爲我可以告訴parse_example,因爲數據是不是我自己的。主要的部分是我指定我所期望的SequenceExample遏制,然後調用

context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized, 
                    context_features=context_features, 
                    sequence_features=sequence_features) 

直接跳到我是如何構建minibatches,我用

def construct_minibatches(x, y, y_len, x_len, 
         bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))): 

    batch_size = tf.app.flags.FLAGS.batch_size 

    with tf.name_scope('batch_examples_using_buckets'): 
     _, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x, 
                   tensors=[x, y, y_len], 
                   batch_size=batch_size, 
                   bucket_boundaries=bucket_boundaries, 
                   dynamic_pad=True, 
                   capacity=2 * batch_size, 
                   allow_smaller_final_batch=True) 

     x_minibatch = outputs[0] 
     y_minibatch = outputs[1] 
     y_lengths_minibatch = outputs[2] 
     return x_minibatch, y_minibatch, y_lengths_minibatch 

注:我不得不改變一些變量名爲隱私問題。希望我沒有犯任何錯誤。

+1

愚蠢的問題,但你確定自培訓開始以來沒有經過時間?什麼會產生「經過時間」? – vega

+0

損失也在穩步下降。 – NRitH

+0

已用時間初始化爲'start_time = time.time()'。然後,在10個minibatches上訓練後,我調用'print(time.time() - start_time)',然後調用'start_time = time.time()'。 –

回答

1

貸款faddy-w同時解決我的兩個問題!

事實證明,我改變我的計算圖而不知道它。

我打電話

sess.run([model.optimizer.minimize(model.loss), model.y_predicted_logits], 
           feed_dict={model.x: x_values, 
              model.y_actual: y_values, 
              model.y_actual_lengths: y_lengths_values}) 

從一個循環內,其中

model.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=self.y_actual, 
                     logits=self.y_predicted_logits)) 

model.optimizer = tf.train.GradientDescentOptimizer(learning_rate=initial_learning_rate) 

不知道optimizer.minimize()增加了額外的操作,以我的圖表。