2017-06-26 25 views
2

我很好奇,是否有一種很好的方法來分享不同RNN小區的權重,同時仍然爲每個小區提供不同的輸入。如何在Tensorflow中的不同輸入中分享跨不同RNN單元的權重?

,我想建立的圖表是這樣的:

enter image description here

那裏有橙色3個LSTM細胞,其並行操作,我想和大家分享的權重之間。

我已經設法實現了類似於我想要使用佔位符的東西(請參閱下面的代碼)。但是,使用佔位符會中斷優化程序的漸變計算,並且不會訓練超過使用佔位符的點的任何內容。在Tensorflow中可以做到這一點嗎?

我使用Tensorflow 1.2和3.5蟒在蟒蛇環境在Windows 7

代碼:

def ann_model(cls,data, act=tf.nn.relu): 
    with tf.name_scope('ANN'): 
     with tf.name_scope('ann_weights'): 
      ann_weights = tf.Variable(tf.random_normal([1, 
                 cls.n_ann_nodes])) 
     with tf.name_scope('ann_bias'): 
      ann_biases = tf.Variable(tf.random_normal([1])) 
     out = act(tf.matmul(data,ann_weights) + ann_biases) 
    return out 

def rnn_lower_model(cls,data): 
    with tf.name_scope('RNN_Model'): 
     data_tens = tf.split(data, cls.sequence_length,1) 
     for i in range(len(data_tens)): 
      data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size, 
                cls.n_rnn_inputs]) 

     rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower) 

     outputs, states = tf.contrib.rnn.static_rnn(rnn_cell, 
                data_tens, 
                dtype=tf.float32) 

     with tf.name_scope('RNN_out_weights'): 
      out_weights = tf.Variable(
        tf.random_normal([cls.n_rnn_nodes_lower,1])) 
     with tf.name_scope('RNN_out_biases'): 
      out_biases = tf.Variable(tf.random_normal([1])) 

     #Encode the output of the RNN into one estimate per entry in 
     #the input sequence 
     predict_list = [] 
     for i in range(cls.sequence_length): 
      predict_list.append(tf.matmul(outputs[i], 
              out_weights) 
              + out_biases) 
    return predict_list 

def create_graph(cls,sess): 
    #Initializes the graph 
    with tf.name_scope('input'): 
     cls.x = tf.placeholder('float',[cls.batch_size, 
             cls.sequence_length, 
             cls.n_inputs]) 
    with tf.name_scope('labels'): 
     cls.y = tf.placeholder('float',[cls.batch_size,1]) 
    with tf.name_scope('community_id'): 
     cls.c = tf.placeholder('float',[cls.batch_size,1]) 

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights  
    cls.input_place = tf.placeholder('float',[cls.batch_size, 
               cls.sequence_length, 
               cls.n_rnn_inputs]) 

    #global step used in optimizer 
    global_step = tf.Variable(0,trainable = False) 

    #Create ANN 
    ann_output = cls.ann_model(cls.c) 
    #Combine output of ANN with other input data x 
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
              range(cls.sequence_length)],1), 
          [cls.batch_size, 
          cls.sequence_length, 
          cls.n_ann_nodes]) 
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2) 

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that 
    #share the same weights. 
    with tf.variable_scope('Lower_RNNs'): 
     #Create RNNs 
     daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2 

當訓練迷你批次分兩個步驟計算:

RNNinput = sess.run(cls.rnn_input,feed_dict = { 
              cls.x:batch_x, 
              cls.y:batch_y, 
              cls.c:batch_c}) 
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput, 
             cls.y:batch_y, 
             cls.x:batch_x, 
             cls.c:batch_c}) 

感謝您的幫助。任何想法,將不勝感激。

+0

爲什麼你有兩個feed_dict? –

+0

第二個與第一個相同,但包含了第一個'sess.run'的結果'RNNinput'。這就是我將具有共享RNN小區的較低層的輸出傳遞給上層的方式。我使用佔位符'cls.input_place'在第二個'sess.run'調用中執行此操作。不幸的是,這破壞了tensorflow的反向傳播計算。 – AlexR

+0

你不應該那樣做。您可以像鏈接中提到的那樣構建一個圖形,提供一次輸入並讓整個網絡訓練。任何理由,爲什麼你無法做到這一點? –

回答

1

您有3個不同的輸入:input_1, input_2, input_3將其饋送到具有共享參數的LSTM模型。然後你連接3 lstm的輸出並將它傳遞給最終的LSTM層。代碼應該是這樣的:

# Create input placeholder for the network 
input_1 = tf.placeholder(...) 
input_2 = tf.placeholder(...) 
input_3 = tf.placeholder(...) 

# create a shared rnn layer 
def shared_rnn(...): 
    ... 
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...) 

# generate the outputs for each input 
with tf.variable_scope('lower_lstm') as scope: 
    out_input_1 = shared_rnn(...) 
    scope.reuse_variables() # the variables will be reused. 
    out_input_2 = shared_rnn(...) 
    scope.reuse_variables() 
    out_input_3 = shared_rnn(...) 

# verify whether the variables are reused 
for v in tf.global_variables(): 
    print(v.name) 

# concat the three outputs 
output = tf.concat... 

# Pass it to the final_lstm layer and out the logits 
logits = final_layer(output, ...) 

train_op = ... 

# train 
    sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...} 
+0

謝謝。這更像我想做的事情。 – AlexR

0

我最終重新思考了一下我的架構,想出了一個更可行的解決方案。

我沒有複製LSTM細胞的中間層來創建三個具有相同權重的不同單元格,而是選擇了三次運行同一個單元格。每次運行的結果都存儲在類似於tf.Variable的「緩衝區」中,然後將整個變量用作最終LSTM層的輸入。 I drew a diagram here

實現這種方式允許有效的輸出後3個時間步驟,並沒有打破tensorflows傳播算法(即人工神經網絡中的節點仍然可以進行訓練。)

唯一棘手的事情是使確保緩衝區按照最終RNN的順序排列。

相關問題