我正在訓練tensorflow.contrib.seq2seq
編碼器 - 解碼器模型,每個小批次的訓練時間單調遞增。每個小批次的TensorFlow Seq2Seq訓練時間單調遞增
Step Number: 10 Elapsed time: 52.89215302467346 Loss: 1.0420862436294556 Metrics: {'accuracy': 0.22499999} Step Number: 20 Elapsed time: 60.28505992889404 Loss: 0.8007364869117737 Metrics: {'accuracy': 0.28} Step Number: 30 Elapsed time: 73.98479580879211 Loss: 0.7292348742485046 Metrics: {'accuracy': 0.34} Step Number: 40 Elapsed time: 82.99069213867188 Loss: 0.6843382120132446 Metrics: {'accuracy': 0.345} Step Number: 50 Elapsed time: 86.97363901138306 Loss: 0.6808319687843323 Metrics: {'accuracy': 0.38999999} Step Number: 60 Elapsed time: 106.96697807312012 Loss: 0.601255476474762 Metrics: {'accuracy': 0.44} Step Number: 70 Elapsed time: 124.17725801467896 Loss: 0.5971778035163879 Metrics: {'accuracy': 0.405} Step Number: 80 Elapsed time: 137.91252613067627 Loss: 0.596596896648407 Metrics: {'accuracy': 0.43000001} Step Number: 90 Elapsed time: 146.6834409236908 Loss: 0.5921837687492371 Metrics: {'accuracy': 0.42500001}
我所有的數據被人爲地產生和隨機抽樣,這意味着(一般)應該在後面的訓練在訓練初期minibatches和minibatches之間沒有什麼區別。另外,我所有的數據都有相同的輸入序列長度和相同的輸出序列長度。爲什麼我的模型需要更長的時間來訓練稍後的迷你貼紙?
我發現這個相關post,但我沒有改變我的訓練循環中的計算圖。
表現出一定的代碼,讓我們在main
開始:
def main(_):
x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline()
model = import_model()
train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch)
```
我的數據存儲爲SequenceExample
S,每TFRecord
文件之一。我construct_data_pipeline()
函數的定義如下:
def construct_data_pipeline():
# extract TFRecord filenames located in data directory
tfrecord_filenames = []
for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir):
for filename in filenames:
if filename.endswith('.tfrecord'):
tfrecord_filenames.append(os.path.join(dirpath, filename))
# read and parse data from TFRecords into tensors
x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames)
# group tensors into minibatches
x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y,
y_len=y_len,
x_len=x_len)
return x_minibatch, y_minibatch, y_lengths_minibatch
步入construct_examples_queue()
def construct_examples_queue(tfrecords_filenames):
number_of_readers = tf.flags.FLAGS.number_of_readers
with tf.name_scope('examples_queue'):
key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames,
tf.TFRecordReader,
num_readers=number_of_readers)
x, y, x_len, y_len = parse_example(example_serialized)
return x, y, x_len, y_len
我不認爲我可以告訴parse_example
,因爲數據是不是我自己的。主要的部分是我指定我所期望的SequenceExample
遏制,然後調用
context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized,
context_features=context_features,
sequence_features=sequence_features)
直接跳到我是如何構建minibatches,我用
def construct_minibatches(x, y, y_len, x_len,
bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))):
batch_size = tf.app.flags.FLAGS.batch_size
with tf.name_scope('batch_examples_using_buckets'):
_, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x,
tensors=[x, y, y_len],
batch_size=batch_size,
bucket_boundaries=bucket_boundaries,
dynamic_pad=True,
capacity=2 * batch_size,
allow_smaller_final_batch=True)
x_minibatch = outputs[0]
y_minibatch = outputs[1]
y_lengths_minibatch = outputs[2]
return x_minibatch, y_minibatch, y_lengths_minibatch
注:我不得不改變一些變量名爲隱私問題。希望我沒有犯任何錯誤。
愚蠢的問題,但你確定自培訓開始以來沒有經過時間?什麼會產生「經過時間」? – vega
損失也在穩步下降。 – NRitH
已用時間初始化爲'start_time = time.time()'。然後,在10個minibatches上訓練後,我調用'print(time.time() - start_time)',然後調用'start_time = time.time()'。 –