2

我在爲TensorFlow中的文本分類構建堆疊LSTM模型時感到迷惑。TensorFlow中的堆疊RNN模型設置

我輸入的數據是這樣的:

x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.], 
      ...... #I trained the network in batch with batch size set to 32. 
      ] 
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]] 
# binary classification 

我的代碼的框架是這樣的:

self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input') 
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target') 

lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.) 
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio) 
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers) 
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32) 

inputs = tf.nn.dropout(self._input, self.dropout_ratio) 
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in 
       tf.split(1, self.max_seq_length, inputs)] 

outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state) 

# We only care about the output of the last RNN cell... 
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2])) 

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target)) 
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) 

train_op = tf.train.AdamOptimizer(self.lr).minimize(loss) 

init = tf.initialize_all_variables() 

with tf.Session() as sess: 
     initializer = tf.random_uniform_initializer(-0.04, 0.04) 
     with tf.variable_scope("model", reuse=True, initializer=initializer): 
      sess.run(init) 
      # generate batches here (omitted for clarity) 
      print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y}) 

的問題是,不管數據集有多大,損失和準確性沒有改善的跡象(看起來完全隨機)。我做錯了什麼?

更新:

# First, load Word2Vec model in Gensim. 
model = Doc2Vec.load(word2vec_path) 

# Second, build the dictionary. 
gensim_dict = Dictionary() 
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True) 
w2indx = {v: k + 1 for k, v in gensim_dict.items()} 
w2vec = {word: model[word] for word in w2indx.keys()} 

# Third, read data from a text file. 
for fname in fnames: 
     i = 0 
     with codecs.open(fname, 'r', encoding='utf8') as fr: 
      for line in fr: 
       tmp = [] 
       for t in line.split(): 

        tmp.append(t) 

       X_train.append(tmp) 
       i += 1 
       if i is samples_count: 
        break 

# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length. 
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32) 
    for rowNo in xrange(len(data)): 
     rowLen = len(data[rowNo]) 
     for colNo in xrange(rowLen): 
      word = data[rowNo][colNo] 
      if word in w2vec: 
       result[rowNo][colNo] = w2vec[word] 
      else: 
       result[rowNo][colNo] = [0] * self.vocab_dim 
     for colPadding in xrange(rowLen, self.max_seq_length): 
      result[rowNo][colPadding] = [0] * self.vocab_dim 
    return result 

# Fifth, generate batches and feed them to the model. 
... Trivias ... 
+0

如果您提供的不僅僅是骨架代碼,還可以提供幫助,因爲我無法通過查看骨架來決定,問題是在您的模型實現中,還是在模型本身中更爲基礎的。例如,你在'with'構造中設置'initializer',但是所有的變量都是在那之前構造的。在這種情況下,將使用默認初始化程序([uniform_unit_scaling_initializer](https://www.tensorflow.org/versions/master/api_docs/python/state_ops.html#uniform_unit_scaling_initializer)),而不是您想要的「random_uniform_initializer」 。 – keveman

+0

謝謝,@keveman!正如你所說的,我在這個問題中增加了更多的代碼。我整整一個晚上都訓練了一下,只是沒有學習。我檢查了偏見,發現它確實正在更新。我嘗試了幾個學習率,如0.1,0.01,0.001和0.0001,但都是徒勞的。請幫助檢查是否有其他地方有什麼問題... –

回答

1

這裏有幾個原因,可能不是訓練和建議嘗試:

  • 你是不是允許更新詞矢量,預瞭解到向量空間可能工作不正常。

  • RNN在訓練時確實需要漸變裁剪。您可以嘗試添加諸如this之類的內容。

  • 單位比例初始化看起來效果更好,因爲它考慮了圖層的大小,並允許漸變正確縮放,因爲它更深。

  • 你應該嘗試刪除丟失和第二層 - 只是爲了檢查你的數據傳遞是否正確,你的損失正在下降。

我也可以推薦嘗試與您的數據這個例子:https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py

它訓練詞矢量從頭開始,已經有梯度剪裁和使用GRUCells通常更容易訓練。您還可以通過運行tensorboard logdir=/tmp/tf_examples/word_rnn查看損失和其他事物的良好可視化效果。

+0

感謝您的建議。我會嘗試添加漸變剪裁,稍後可能會進行標準化。預先學習的矢量應該正常工作,因爲我用它來完成另一項任務,並且一切正常。 –