0

我在高中,我正在嘗試執行一個涉及神經網絡的項目。我正在使用Ubuntu並嘗試使用tensorflow進行強化學習,但是當我訓練神經網絡時,我總是會收到很多欠載警告。它們採用ALSA lib pcm.c:7963:(snd_pcm_recover) underrun occurred的形式。隨着培訓的進行,此消息將越來越頻繁地顯示在屏幕上。最終,我得到一個ResourceExhaustedError並且程序終止。這裏是完整的錯誤信息:張量流緩衝區不足和資源可用錯誤

W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[320000,512] 
Traceback (most recent call last): 
    File "./train.py", line 121, in <module> 
    loss, _ = model.train(minibatch, gamma, sess) # Train the model based on the batch, the discount factor, and the tensorflow session. 
    File "/home/perrin/neural/dqn.py", line 174, in train 
    return sess.run([self.loss, self.optimize], feed_dict=self.feed_dict) # Runs the training. This is where the underrun errors happen 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run 
    feed_dict_string, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run 
    target_list, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call 
    raise type(e)(node_def, op, message) 
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[320000,512] 
    [[Node: gradients/fully_connected/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](dropout/mul, gradients/fully_connected/BiasAdd_grad/tuple/control_dependency)]] 

Caused by op u'gradients/fully_connected/MatMul_grad/MatMul_1', defined at: 
    File "./train.py", line 72, in <module> 
    model = AC_Net([None, 201, 201, 3], 5, trainer) # This creates the neural network using the imported AC_Net class. 
    File "/home/perrin/neural/dqn.py", line 128, in __init__ 
    self.optimize = trainer.minimize(self.loss) # This tells the trainer to adjust the weights in such a way as to minimize the loss. This is what actually 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 269, in minimize 
    grad_loss=grad_loss) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 335, in compute_gradients 
    colocate_gradients_with_ops=colocate_gradients_with_ops) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients 
    in_grads = grad_fn(op, *out_grads) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 731, in _MatMulGrad 
    math_ops.matmul(op.inputs[0], grad, transpose_a=True)) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul 
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul 
    transpose_b=transpose_b, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__ 
    self._traceback = _extract_stack() 

...which was originally created as op u'fully_connected/MatMul', defined at: 
    File "./train.py", line 72, in <module> 
    model = AC_Net([None, 201, 201, 3], 5, trainer) # This creates the neural network using the imported AC_Net class. 
    File "/home/perrin/neural/dqn.py", line 63, in __init__ 
    net = slim.fully_connected(net, 512, activation_fn=tf.nn.elu, scope='fully_connected') # Feeds the input through a fully connected layer 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args 
    return func(*args, **current_args) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1350, in fully_connected 
    outputs = standard_ops.matmul(inputs, weights) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul 
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul 
    transpose_b=transpose_b, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__ 
    self._traceback = _extract_stack() 

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[320000,512] 
    [[Node: gradients/fully_connected/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](dropout/mul, gradients/fully_connected/BiasAdd_grad/tuple/control_dependency)]] 

我研究了這些問題,但沒有弄清楚我如何修復它們。我對編程相當陌生,所以我不太瞭解緩衝區和數據讀寫的工作原理。我對這些錯誤感到困惑。有誰知道我的代碼的哪些部分可能會導致這種情況,以及如何解決它?感謝您花時間考慮這個問題!

這裏是我定義的神經網絡(基於this tutorial)代碼:

#! /usr/bin/python 

import numpy as np 
import tensorflow as tf 
slim = tf.contrib.slim 

# The neural network 
class AC_Net: 
    # This defines the actual neural network. 
    # output_size: the number of outputs of the policy 
    # trainer: the tensorflow training optimizer used by the network 
    def __init__(self, input_shape, output_size, trainer): 

     with tf.name_scope('input'): 
      self.input = tf.placeholder(shape=list(input_shape), dtype=tf.float32, name='input') 
      net = tf.image.per_image_standardization(self.input[0]) 
      net = tf.expand_dims(net, [0]) 

     with tf.name_scope('convolution'): 
      net = slim.conv2d(net, 32, [8, 8], activation_fn=tf.nn.elu, scope='conv') 
      net = slim.max_pool2d(net, [2, 2], scope='pool') 

     net = slim.flatten(net) 
     net = tf.nn.dropout(net, .5) 
     net = slim.fully_connected(net, 512, activation_fn=tf.nn.elu, scope='fully_connected') 
     net = tf.nn.dropout(net, .5) 

     with tf.name_scope('LSTM'): 
      cell = tf.nn.rnn_cell.BasicLSTMCell(256, state_is_tuple=True, activation=tf.nn.elu) 

      with tf.name_scope('state_in'): 
       state_in = cell.zero_state(tf.shape(net)[0], tf.float32) 

      net = tf.expand_dims(net, [0]) 
      step_size = tf.shape(self.input)[:1] 
      output, state = tf.nn.dynamic_rnn(cell, net, initial_state=state_in, sequence_length=step_size, time_major=False, scope='LSTM') 

     out = tf.reshape(output, [-1, 256]) 
     out = tf.nn.dropout(out, .5) 
     self.policy = slim.fully_connected(out, output_size, activation_fn=tf.nn.softmax, scope='policy') 

     self.value = slim.fully_connected(out, 1, activation_fn=None, scope='value') 

     # Defines the loss functions 
     with tf.name_scope('loss_function'): 
      self.target_values = tf.placeholder(dtype=tf.float32, name='target_values') # The target value is the discounted reward. 
      self.actions = tf.placeholder(dtype=tf.int32, name='actions') # This is the network's policy. 
      # The advantage is the difference between what the network thought the value of an action was, and what it actually was. 
      # It is computed as R - V(s), where R is the discounted reward and V(s) is the value of being in state s. 
      self.advantages = tf.placeholder(dtype=tf.float32, name='advantages') 

      with tf.name_scope('entropy'): 
       entropy = -tf.reduce_sum(tf.log(self.policy + 1e-10) * self.policy) 
      with tf.name_scope('responsible_actions'): 
       actions_onehot = tf.one_hot(self.actions, output_size, dtype=tf.float32)  
       responsible_actions = tf.reduce_sum(self.policy * actions_onehot, [1]) # This returns only the actions that were selected. 

      with tf.name_scope('loss'): 

       with tf.name_scope('value_loss'): 
        self.value_loss = tf.reduce_sum(tf.square(self.target_values - tf.reshape(self.value, [-1]))) 

       with tf.name_scope('policy_loss'): 
        self.policy_loss = -tf.reduce_sum(tf.log(responsible_actions + 1e-10) * self.advantages) 

       with tf.name_scope('total_loss'): 
        self.loss = self.value_loss + self.policy_loss - entropy * .01 

       tf.summary.scalar('loss', self.loss) 

     with tf.name_scope('gradient_clipping'): 
      tvars = tf.trainable_variables() 
      grads = tf.gradients(self.loss, tvars)   
      grads, _ = tf.clip_by_global_norm(grads, 20.) 
     self.optimize = trainer.apply_gradients(zip(grads, tvars)) 

    def predict(self, inputs, sess): 
     return sess.run([self.policy, self.value], feed_dict={self.input:inputs}) 

    def train(self, train_batch, gamma, sess): 

     inputs = train_batch[:, 0] 
     actions = train_batch[:, 1] 
     rewards = train_batch[:, 2] 
     values = train_batch[:, 4] 

     discounted_rewards = rewards[::-1] 
     for i, j in enumerate(discounted_rewards): 
      if i > 0: 
       discounted_rewards[i] += discounted_rewards[i - 1] * gamma 
     discounted_rewards = np.array(discounted_rewards, np.float32)[::-1] 
     advantages = discounted_rewards - values 
     self.feed_dict = { 
       self.input:np.vstack(inputs), 
       self.target_values:discounted_rewards, 
       self.actions:actions, 
       self.advantages:advantages 
       } 
     return sess.run([self.loss, self.optimize], feed_dict=self.feed_dict) 

這裏是我的訓練神經網絡代碼:

#! /usr/bin/python 

import game_env, move_right, move_right_with_obs, random, inspect, os 
import tensorflow as tf 
import numpy as np 
from dqn import AC_Net 

def process_outputs(x): 
    a = [int(x > 2), int(x%2 == 0 and x > 0)*2-int(x > 0)] 
    return a 

environment = game_env # The environment to use 
env_name = str(inspect.getmodule(environment).__name__) # The name of the environment 

ep_length = 2000 
num_episodes = 20 

total_steps = ep_length * num_episodes # The total number of steps 
model_path = '/home/perrin/neural/nn/' + env_name 

learning_rate = 1e-4 # The learning rate 
trainer = tf.train.AdamOptimizer(learning_rate=learning_rate) # The gradient descent optimizer used 
first_epsilon = 0.6 # The initial chance of random action 
final_epsilon = 0.01 # The final chance of random action 
gamma = 0.9 
anneal_steps = 35000 # The number of steps it takes to go from initial to random 

count = 0 # Keeps track of the number of steps we've run 
experience_buffer = [] # Stores the agent's experiences in a list 
buffer_size = 10000 # How large the experience buffer can be 
train_step = 256 # How often to train the model 
batches_per_train = 10 
save_step = 500 # How often to save the trained model 
batch_size = 256 # How many experiences to train on at once 
env_size = 500 # How many pixels tall and wide the environment should be. 
load_model = True # Whether or not to load a pretrained model 
train = True # Whether or not to train the model 
test = False # Whether or not to test the model 

tf.reset_default_graph() 

sess = tf.InteractiveSession() 

model = AC_Net([None, 201, 201, 3], 5, trainer) 
env = environment.Env(env_size) 
action = [0, 0] 
state, _ = env.step(True, action) 

saver = tf.train.Saver() # This saves the model 
epsilon = first_epsilon 
tf.global_variables_initializer().run() 

if load_model: 
    ckpt = tf.train.get_checkpoint_state(model_path) 
    saver.restore(sess, ckpt.model_checkpoint_path) 
    print 'Model loaded' 

prev_out = None 

while count <= total_steps and train: 

    if random.random() < epsilon or count == 0: 
     if prev_out is not None: 
      out = prev_out 
     if random.randint(0, 100) == 100 or prev_out is None: 
      out = np.random.rand(5) 
      out = np.array([val/np.sum(out) for val in out]) 
      _, value = model.predict(state, sess) 
      prev_out = out 

    else: 
     out, value = model.predict(state, sess) 
     out = out[0] 
    act = np.random.choice(out, p=out) 
    act = np.argmax(out == act) 
    act1 = process_outputs(act) 
    action[act1[0]] = act1[1] 
    _, reward = env.step(True, action) 
    new_state = env.get_state() 

    experience_buffer.append((state, act, reward, new_state, value[0, 0])) 

    state = new_state 

    if len(experience_buffer) > buffer_size: 
     experience_buffer.pop(0) 

    if count % train_step == 0 and count > 0: 
     print "Training model" 
     for i in range(batches_per_train): 
     # Get a random sample of experiences and train the model based on it. 
      x = random.randint(0, len(experience_buffer)-batch_size) 
      minibatch = np.array(experience_buffer[x:x+batch_size]) 
      loss, _ = model.train(minibatch, gamma, sess) 
      print "Loss for batch", str(i+1) + ":", loss 


    if count % save_step == 0 and count > 0: 
     saver.save(sess, model_path+'/model-'+str(count)+'.ckpt') 
     print "Model saved" 

    if count % ep_length == 0 and count > 0: 
     print "Starting new episode" 
     env = environment.Env(env_size) 

    if epsilon > final_epsilon: 
     epsilon -= (first_epsilon - final_epsilon)/anneal_steps 

    count += 1 

while count <= total_steps and test: 
    out, _ = model.predict(state, sess) 
    out = out[0] 
    act = np.random.choice(out, p=out) 
    act = np.argmax(out == act) 
    act1 = process_outputs(act) 
    action[act1[0]] = act1[1] 
    state, reward = env.step(True, action) 
    new_state = env.get_state() 
    count += 1 

# Write log files to create tensorboard visualizations 
merged = tf.summary.merge_all() 
writer = tf.summary.FileWriter('/home/perrin/neural/summaries', sess.graph) 
if train: 
    summary = sess.run(merged, feed_dict=model.feed_dict) 
    writer.add_summary(summary) 
writer.flush() 
+0

您內存不足,您可以嘗試使用較小的批量大小嗎? –

+0

@YaroslavBulatov感謝您的建議。我試了批量大小爲10,但我仍然得到了所有的錯誤。 – CyborgOctopus

+0

批量大小1呢?如果內存不足,則需要將網絡設置得更小,或者使用更多內存的機器。 –

回答

0

您正在運行內存不足。您的網絡可能需要比您需要更多的內存才能運行,因此追蹤內存過度使用的第一步是找出正在使用這麼多內存的內容。

下面是一個使用時間表和statssummarizer一種方法: https://gist.github.com/yaroslavvb/08afccbe087171881ceafc0c98abca05

這將打印出幾張表,其中一個表是通過頂部的內存使用量排序的張量。你應該檢查你在那裏沒有特別大的東西。

您也可以使用Chrome可視化,詳細的here

更先進的技術來繪製的內存分配/釋放操作的時間表看到內存的時間表,在本issue

理論上你的內存使用情況不應該做如果不創建新的有狀態操作(變量),那麼在步驟之間不會增長,但是如果張量的大小在步驟之間變化,我發現全局內存分配可能會增加。

解決方法是定期將參數保存到檢查點並重新啓動腳本。