數據集描述連體模型LSTM網絡出現故障時使用tensorflow
數據集包含了一組問題對,如果問題是相同告訴標籤訓練。例如
「我如何閱讀和查找我的YouTube評論?」 ,「我怎樣才能看到我所有的 Youtube評論?」 ,「1」
該模型的目標是確定給定的問題對是相同還是不同。
方法
我創建了一個Siamese network來識別,如果兩個問題都是一樣的。下面是該模型:
with graph.as_default():
diff = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(question1_outputs[:, -1, :], question2_outputs[:, -1, :])), reduction_indices=1))
margin = tf.constant(1.)
labels = tf.to_float(labels)
match_loss = tf.expand_dims(tf.square(diff, 'match_term'), 0)
mismatch_loss = tf.expand_dims(tf.maximum(0., tf.subtract(margin, tf.square(diff)), 'mismatch_term'), 0)
loss = tf.add(tf.matmul(labels, match_loss), tf.matmul((1 - labels), mismatch_loss), 'loss_add')
distance = tf.reduce_mean(loss)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(distance)
以下是代碼訓練模型:
with graph.as_default():
saver = tf.train.Saver()
with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer(), feed_dict={embedding_placeholder: embedding_matrix})
iteration = 1
for e in range(epochs):
summary_writer = tf.summary.FileWriter('/Users/mithun/projects/kaggle/quora_question_pairs/logs', sess.graph)
summary_writer.add_graph(sess.graph)
for ii, (x1, x2, y) in enumerate(get_batches(question1_train, question2_train, label_train, batch_size), 1):
feed = {question1_inputs: x1,
question2_inputs: x2,
labels: y[:, None],
keep_prob: 0.9
}
loss1 = sess.run([distance], feed_dict=feed)
if iteration%5==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss1))
if iteration%50==0:
val_acc = []
for x1, x2, y in get_batches(question1_val, question2_val, label_val, batch_size):
feed = {question1_inputs: x1,
question2_inputs: x2,
labels: y[:, None],
keep_prob: 1
}
batch_acc = sess.run([accuracy], feed_dict=feed)
val_acc.append(batch_acc)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
saver.save(sess, "checkpoints/quora_pairs.ckpt")
我已經訓練上述模型
graph = tf.Graph()
with graph.as_default():
embedding_placeholder = tf.placeholder(tf.float32, shape=embedding_matrix.shape, name='embedding_placeholder')
with tf.variable_scope('siamese_network') as scope:
labels = tf.placeholder(tf.int32, [batch_size, None], name='labels')
keep_prob = tf.placeholder(tf.float32, name='question1_keep_prob')
with tf.name_scope('question1') as question1_scope:
question1_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question1_inputs')
question1_embedding = tf.get_variable(name='embedding', initializer=embedding_placeholder, trainable=False)
question1_embed = tf.nn.embedding_lookup(question1_embedding, question1_inputs)
question1_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
question1_drop = tf.contrib.rnn.DropoutWrapper(question1_lstm, output_keep_prob=keep_prob)
question1_multi_lstm = tf.contrib.rnn.MultiRNNCell([question1_drop] * lstm_layers)
q1_initial_state = question1_multi_lstm.zero_state(batch_size, tf.float32)
question1_outputs, question1_final_state = tf.nn.dynamic_rnn(question1_multi_lstm, question1_embed, initial_state=q1_initial_state)
scope.reuse_variables()
with tf.name_scope('question2') as question2_scope:
question2_inputs = tf.placeholder(tf.int32, [batch_size, seq_len], name='question2_inputs')
question2_embedding = question1_embedding
question2_embed = tf.nn.embedding_lookup(question2_embedding, question2_inputs)
question2_lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
question2_drop = tf.contrib.rnn.DropoutWrapper(question2_lstm, output_keep_prob=keep_prob)
question2_multi_lstm = tf.contrib.rnn.MultiRNNCell([question2_drop] * lstm_layers)
q2_initial_state = question2_multi_lstm.zero_state(batch_size, tf.float32)
question2_outputs, question2_final_state = tf.nn.dynamic_rnn(question2_multi_lstm, question2_embed, initial_state=q2_initial_state)
使用RNN輸出計算餘弦距離與約10,000標記的數據。但是,準確度在0.630左右停滯不前,奇怪的是所有迭代中的驗證準確性都是相同的。
lstm_size = 64
lstm_layers = 1
batch_size = 128
learning_rate = 0.001
我創建模型的方式有什麼問題嗎?
一個很好的調試第一遍:使網絡完全線性化並將其適用於一個或兩個簡單的例子。一旦它適合(令人驚訝的是它不會),慢慢重新引入非線性。由於學習任務是微不足道的,您可以將緩慢或不存在的學習歸因於死亡/飽和的非線性。 –
很難說準確度如何(我不熟悉數據集或體系結構),但有幾件事。不知道爲什麼你不想學習你的嵌入,但是你應該說'可訓練=假',而不是'可訓練='假'',這將不起作用。另外,它不應該受到傷害,但是如果稍後將它放在兩個不同的地方,我認爲你不需要'scope.reuse_variables()'或'tf.sqrt'作爲'diff'。 – jdehesa
我已經用簡要的數據集描述和模型的目標更新了這個問題。 1)因爲我正在使用預先訓練的單詞嵌入,所以我設置了「可訓練=假」。 2)我在這裏使用Siamese網絡,在高層它涉及兩個相同的網絡使用相同的權重,然後我們找到兩個網絡輸出之間的距離。如果距離小於閾值,那麼它們是相同的,否則不是。因此我使用了'scope.reuse_varables'。 – Mithun