我想在contrib包(tf.contrib.ctc.ctc_loss)下使用Tensorflow的CTC實現沒有成功。使用Tensorflow的Connectionist時間分類(CTC)實現
- 首先,任何人都知道我可以在哪裏閱讀一個很好的分步教程? Tensorflow的文檔在這個主題上很差。
- 我是否必須向ctc_loss提供空白標籤交錯或不交錯的標籤?
- 即使使用長度超過200個紀元的火車數據集,我也無法適應我的網絡。 :(
- 如何使用tf.edit_distance我計算的標籤錯誤率
這裏是我的代碼:?!
with graph.as_default():
max_length = X_train.shape[1]
frame_size = X_train.shape[2]
max_target_length = y_train.shape[1]
# Batch size x time steps x data width
data = tf.placeholder(tf.float32, [None, max_length, frame_size])
data_length = tf.placeholder(tf.int32, [None])
# Batch size x max_target_length
target_dense = tf.placeholder(tf.int32, [None, max_target_length])
target_length = tf.placeholder(tf.int32, [None])
# Generating sparse tensor representation of target
target = ctc_label_dense_to_sparse(target_dense, target_length)
# Applying LSTM, returning output for each timestep (y_rnn1,
# [batch_size, max_time, cell.output_size]) and the final state of shape
# [batch_size, cell.state_size]
y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), # num_proj=num_classes
data,
dtype=tf.float32,
sequence_length=data_length,
)
# For sequence labelling, we want a prediction for each timestamp.
# However, we share the weights for the softmax layer across all timesteps.
# How do we do that? By flattening the first two dimensions of the output tensor.
# This way time steps look the same as examples in the batch to the weight matrix.
# Afterwards, we reshape back to the desired shape
# Reshaping
logits = tf.transpose(y_rnn1, perm=(1, 0, 2))
# Get the loss by calculating ctc_loss
# Also calculates
# the gradient. This class performs the softmax operation for you, so inputs
# should be e.g. linear projections of outputs by an LSTM.
loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))
# Define our optimizer with learning rate
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)
# Decoding using beam search
decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)
感謝
更新(2016年6月29日)
謝謝@ jihyeon-seo!因此,我們在RNN的輸入上有如[num_batch,max_time_step,num_features]。 e使用dynamic_rnn執行給定輸入的循環計算,輸出形狀的張量[num_batch,max_time_step,num_hidden]。之後,我們需要在每個矩陣步中進行仿射投影,並分配權重,所以我們必須重塑爲[num_batch * max_time_step,num_hidden],乘以形狀[num_hidden,num_classes]的權重矩陣,然後求和重塑,轉置(所以我們將有[max_time_steps,num_batch,num_classes]用於ctc丟失輸入),並且這個結果將是ctc_loss函數的輸入。我做的一切正確嗎?
這是代碼:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
更新(2016年7月11日)
謝謝@Xiv。這裏是bug修復後的代碼:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
self._logits = tf.transpose(self._logits, (1,0,2))
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
更新(16年7月25日)
我published在我的代碼GitHub的一部分,一個話語的工作。隨意使用! :)
在RNN後重塑後,代碼中出現錯誤。 如果矩陣是Time Major,那麼你的整形是正確的,但是RNN需要有time_major = True傳入。 如果矩陣是Batch Major,那麼你需要tf.transpose(tf.reshape([ -1,max_length,num_classes]),[1,0,2]) – Xiv