使用Tensorflow的Connectionist時間分類（CTC）實現

我想在contrib包（tf.contrib.ctc.ctc_loss）下使用Tensorflow的CTC實現沒有成功。使用Tensorflow的Connectionist時間分類（CTC）實現

首先，任何人都知道我可以在哪裏閱讀一個很好的分步教程？ Tensorflow的文檔在這個主題上很差。
我是否必須向ctc_loss提供空白標籤交錯或不交錯的標籤？
即使使用長度超過200個紀元的火車數據集，我也無法適應我的網絡。 :(
如何使用tf.edit_distance我計算的標籤錯誤率

這裏是我的代碼：？！

with graph.as_default(): 

    max_length = X_train.shape[1] 
    frame_size = X_train.shape[2] 
    max_target_length = y_train.shape[1] 

    # Batch size x time steps x data width 
    data = tf.placeholder(tf.float32, [None, max_length, frame_size]) 
    data_length = tf.placeholder(tf.int32, [None]) 

    # Batch size x max_target_length 
    target_dense = tf.placeholder(tf.int32, [None, max_target_length]) 
    target_length = tf.placeholder(tf.int32, [None]) 

    # Generating sparse tensor representation of target 
    target = ctc_label_dense_to_sparse(target_dense, target_length) 

    # Applying LSTM, returning output for each timestep (y_rnn1, 
    # [batch_size, max_time, cell.output_size]) and the final state of shape 
    # [batch_size, cell.state_size] 
    y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), # num_proj=num_classes 
    data, 
    dtype=tf.float32, 
    sequence_length=data_length, 
) 

    # For sequence labelling, we want a prediction for each timestamp. 
    # However, we share the weights for the softmax layer across all timesteps. 
    # How do we do that? By flattening the first two dimensions of the output tensor. 
    # This way time steps look the same as examples in the batch to the weight matrix. 
    # Afterwards, we reshape back to the desired shape 


    # Reshaping 
    logits = tf.transpose(y_rnn1, perm=(1, 0, 2)) 

    # Get the loss by calculating ctc_loss 
    # Also calculates 
    # the gradient. This class performs the softmax operation for you, so inputs 
    # should be e.g. linear projections of outputs by an LSTM. 
    loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length)) 

    # Define our optimizer with learning rate 
    optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss) 

    # Decoding using beam search 
    decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

感謝

更新（2016年6月29日）

謝謝@ jihyeon-seo！因此，我們在RNN的輸入上有如[num_batch，max_time_step，num_features]。 e使用dynamic_rnn執行給定輸入的循環計算，輸出形狀的張量[num_batch，max_time_step，num_hidden]。之後，我們需要在每個矩陣步中進行仿射投影，並分配權重，所以我們必須重塑爲[num_batch * max_time_step，num_hidden]，乘以形狀[num_hidden，num_classes]的權重矩陣，然後求和重塑，轉置（所以我們將有[max_time_steps，num_batch，num_classes]用於ctc丟失輸入），並且這個結果將是ctc_loss函數的輸入。我做的一切正確嗎？

這是代碼：

cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True) 

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32) 

    # Reshaping to share weights accross timesteps 
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden]) 

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1 

    # Reshaping 
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes]) 

    # Calculating loss 
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length) 

    self.cost = tf.reduce_mean(loss)

更新（2016年7月11日）

謝謝@Xiv。這裏是bug修復後的代碼：

cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True) 

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32) 

    # Reshaping to share weights accross timesteps 
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden]) 

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1 

    # Reshaping 
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes]) 
    self._logits = tf.transpose(self._logits, (1,0,2)) 

    # Calculating loss 
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length) 

    self.cost = tf.reduce_mean(loss)

更新（16年7月25日）

我published在我的代碼GitHub的一部分，一個話語的工作。隨意使用！ :)

來源

2016-06-27 Igor Macedo Quintanilha

在RNN後重塑後，代碼中出現錯誤。如果矩陣是Time Major，那麼你的整形是正確的，但是RNN需要有time_major = True傳入。如果矩陣是Batch Major，那麼你需要tf.transpose（tf.reshape（[ -1，max_length，num_classes]），[1,0,2]） – Xiv

我正在嘗試做同樣的事情。這裏是我發現你可能感興趣的。

它真的很難找到ctc的教程，但這個例子（https://github.com/tensorflow/tensorflow/blob/679f95e9d8d538c3c02c0da45606bab22a71420e/tensorflow/python/kernel_tests/ctc_loss_op_test.py）很有幫助。

對於空白標籤，ctc層假定空白索引是num_classes - 1，所以你需要爲空白標籤提供一個額外的類。（https://github.com/tensorflow/tensorflow/blob/d42facc3cc9611f0c9722c81551a7404a0bd3f6b/tensorflow/core/kernels/ctc_loss_op.cc，line 146）

也ctc網絡執行softmax層。在你的代碼中，rnn層連接到ctc丟失層。 rnn層的輸出是內部激活的，所以你需要添加一個隱藏層（它可以是輸出層）而沒有激活功能，然後添加ctc丟失層。

來源

2016-06-28 07:04:10

謝謝@ jihyeon-seo。 CTC丟失訓練你的網絡有什麼問題嗎？這個網絡過於困難，但是在許多論文中，作者說LSTM網絡過於合適，並且我無法用1個帶有320個存儲器單元的LSTM層來過度擬合我的網絡，只使用1個話語（TIMIT語料庫，帶有過濾器銀行功能）即使在2000年以後。 :( –

經過100個紀元後，我得到了一句LSTM的模型 –

我覺得你可以檢查LSTM層和CTC Loss Layer之間的輸入和輸出張量你檢查過ctc層返回的損失是否在每個時期更新？ –

有關雙向LSTM，CTC和編輯距離實現的示例，請參閱here，在TIMIT語料庫上訓練音素識別模型。如果您在該語料庫的訓練集上進行訓練，120分鐘左右後，您應該能夠將音素錯誤率降至20-25％。

來源

2016-07-13 13:11:06

謝謝Jon Rein！ :) –

樂意幫忙。如果它適合你，你會介意接受答案嗎？ –

使用Tensorflow的Connectionist時間分類（CTC）實現

回答

相關問題