2016-12-30 138 views
0

與單細胞GRU運行的RNN,我運行到那裏,我得到以下堆棧跟蹤tensorflow損失楠同時培養了RNN

Traceback (most recent call last): 
    File "language_model_test.py", line 15, in <module> 
    test_model() 
    File "language_model_test.py", line 12, in test_model 
    model.train(random_data, s) 
    File "/home/language_model/language_model.py", line 120, in train 
    train_pp = self._run_epoch(data, sess, inputs, rnn_ouputs, loss, trainOp, verbose) 
    File "/home/language_model/language_model.py", line 92, in _run_epoch 
    loss, _= sess.run([loss, trainOp], feed_dict=feed) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 952, in _run 
    fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 408, in __init__ 
    self._fetch_mapper = _FetchMapper.for_fetch(fetches) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 230, in for_fetch 
    return _ListFetchMapper(fetch) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 337, in __init__ 
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches] 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 238, in for_fetch 
    return _ElementFetchMapper(fetches, contraction_fn) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 271, in __init__ 
    % (fetch, type(fetch), str(e))) 
TypeError: Fetch argument nan has invalid type <type 'numpy.float32'>, must be a string or Tensor. (Can not convert a float32 into a Tensor or Operation.) 

計算損失的步驟似乎是問題

情況
def train(self,data, session=tf.Session(), verbose=10): 

     print "initializing model" 
     self._add_placeholders() 
     inputs = self._add_embedding() 
     rnn_ouputs, _ = self._run_rnn(inputs) 
     outputs = self._projection_layer(rnn_ouputs) 
     loss = self._compute_loss(outputs) 
     trainOp = self._add_train_step(loss) 
     start = tf.initialize_all_variables() 
     saver = tf.train.Saver() 

     with session as sess: 
      sess.run(start) 

      for epoch in xrange(self._max_epochs): 
       train_pp = self._run_epoch(data, sess, inputs, rnn_ouputs, loss, trainOp, verbose) 
       print "Training preplexity for batch {} - {}".format(epoch, train_pp) 

這裏是_run_epoch

代碼與損失的任何地方回來nan

def _run_epoch(self, data, session, inputs, rnn_ouputs, loss, trainOp, verbose=10): 
    with session.as_default() as sess: 
     total_steps = sum(1 for x in data_iterator(data, self._batch_size, self._max_steps)) 
     train_loss = [] 
     for step, (x,y, l) in enumerate(data_iterator(data, self._batch_size, self._max_steps)): 
      print "step - {0}".format(step) 
      feed = { 
       self.input_placeholder: x, 
       self.label_placeholder: y, 
       self.sequence_length: l, 
       self._dropout_placeholder: self._dropout, 
      } 
      loss, _= sess.run([loss, trainOp], feed_dict=feed) 
      print "loss - {0}".format(loss) 
      train_loss.append(loss) 
      if verbose and step % verbose == 0: 
       sys.stdout.write('\r{}/{} : pp = {}'. format(step, total_steps, np.exp(np.mean(train_loss)))) 
       sys.stdout.flush() 
      if verbose: 
       sys.stdout.write('\r') 

     return np.exp(np.mean(train_loss)) 

這當我通過使用用於我的數據 random_data = np.random.normal(0, 100, size=[42068, 46])其被設計成使用詞ID是傳遞作爲輸入,以模擬以下測試我的代碼被產生。我的代碼的其餘部分可以在以下gist

編輯在這裏被發現的是,我運行測試套件,此問題將產生的方式:

def test_model(): 
    model = Language_model(vocab=range(0,101)) 
    s = tf.Session() 
    #1 more than step size to acoomodate for the <eos> token at the end 
    random_data = np.random.normal(0, 100, size=[42068, 46]) 
    # file = "./data/ptb.test.txt" 
    print "Fitting started" 
    model.train(random_data, s) 

if __name__ == "__main__": 
    test_model() 

如果我代替random_data成其他語言模型,他們也將輸出nan的成本。我的理解是,通過傳遞給字典中的tensorflow應該取數值並檢索與該id對應的適當嵌入向量,我不明白爲什麼random_data對其他模型造成nan

回答

0

有幾個問題與上述

代碼讓開始與此線

random_data = np.random.normal(0, 100, size=[42068, 46]) 

np.random.normal(...)不產生不同的值,它而產生的浮點值,讓嘗試上述下面的例子,但具有可管理的大小。

>>> np.random.normal(0, 100, size=[5]) 
array([-53.12407229, 39.57335574, -98.25406749, 90.81471139, -41.05069646]) 

有沒有辦法,因爲這些都意味着是輸入嵌入模式,我們已經與浮點值相處得負值的機器學習算法可以學習這些。

什麼是真正想要的是下面的代碼:

random_data = np.random.randint(0, 101, size=...) 

檢查它的輸出,我們得到

>>> np.random.randint(0, 100, size=[5]) 
array([27, 47, 33, 12, 24]) 

接下來,以下行實際上是創建一個微妙的問題。

def _run_epoch(self, data, session, inputs, rnn_ouputs, loss, train, verbose=10): 
    with session.as_default() as sess: 
     total_steps = sum(1 for x in data_iterator(data, self._batch_size, self._max_steps)) 
     train_loss = [] 
     for step, (x,y, l) in enumerate(data_iterator(data, self._batch_size, self._max_steps)): 
      print "step - {0}".format(step) 
      feed = { 
       self.input_placeholder: x, 
       self.label_placeholder: y, 
       self.sequence_length: l, 
       self._dropout_placeholder: self._dropout, 
      } 
      loss, _= sess.run([loss, train], feed_dict=feed) 
      print "loss - {0}".format(loss) 
      train_loss.append(loss) 
      if verbose and step % verbose == 0: 
       sys.stdout.write('\r{}/{} : pp = {}'. format(step, total_steps, np.exp(np.mean(train_loss)))) 
       sys.stdout.flush() 
      if verbose: 
       sys.stdout.write('\r') 

     return np.exp(np.mean(train_loss)) 

loss既是參數變量和一個變量,所以第一次它的運行,這將不再是一個張量,所以我們不能真正把它在一個會話。