2017-02-19 23 views
0

批標準化是不節能的移動平均值和方差移動批標準化層沒有更新其移動平均值和方差移動

當我訓練我得到完美的過度適應我的訓練數據(如預期)。使用批量標準化,培訓速度更快,也如預期的那樣。然而,當訓練步驟剛結束時,我運行相同的模型相同的數據「is_training」= False它會給出一個非常低劣的結果。而且,每次我看着move_mean和moving_variance,它們都是它們的默認值。他們從不更新。

(u'main/y/y/moving_mean:0', array([ 0., 0.], dtype=float32)) 
(u'main/y/y/moving_variance:0', array([ 1., 1.], dtype=float32)) \ 
(u'main/y/y/moving_mean:0', array([ 0., 0.], dtype=float32)) 
(u'main/y/y/moving_variance:0', array([ 1., 1.], dtype=float32)) 
700 with generated means (training = true} 1.0 with saved means {training = false} 0.4911 

我有update_ops代碼,但它似乎並沒有這樣做。 update_collections = None使它起作用,但我被告知這是性能方面的次優解決方案。

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
    if update_ops: 
     updates = tf.group(*update_ops) 
     cost = with_dependencies([updates], cost) 

我的代碼如下

import numpy as np 
import tensorflow as tf 
from tensorflow.contrib.layers import fully_connected, softmax, batch_norm 
from tensorflow.python.ops.control_flow_ops import with_dependencies 
from tensorflow.python.training.adam import AdamOptimizer 

batch_size = 100 
input_size = 10 
noise_strength = 4 

class Data(object): 
    def __init__(self,obs,gold): 
     self.obs=obs 
     self.gold=gold 

def generate_data(batch_size,input_size,noise_strength): 
    input = np.random.rand(batch_size, input_size) * noise_strength 
    gold = np.random.randint(0, 2, (input_size,1)) 
    input = input + gold 
    return Data(input,gold) 


def ffnn_model(inputs,num_classes,batch_size,is_training,reuse=False): 
    output = fully_connected(inputs, 
          num_classes * 2, 
          activation_fn=None, 
          normalizer_fn=batch_norm, 
          normalizer_params={'is_training': is_training, 'reuse': reuse, 'scope': 'y'}, 
          reuse=reuse, 
          scope='y' 
          ) 
    y = softmax(tf.reshape(output, [batch_size, num_classes, 2])) 
    return y 


#objective function 
def objective_function(y,gold): 
    indices = tf.stack([tf.range(tf.size(gold)),tf.reshape(gold,[-1])],axis=1) 
    scores = tf.gather_nd(tf.reshape(y,[-1,2]),indices=indices) 
    # return tf.cast(indices,tf.float32),-tf.reduce_mean(tf.log(scores+1e-6)) 
    return -tf.reduce_mean(tf.log(scores+1e-6)) 

def train_op(y,gold): 
    cost = objective_function(y,gold) 
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
    if update_ops: 
     print "yes to update_ops" 
     print update_ops 
     updates = tf.group(*update_ops) 
     cost = with_dependencies([updates], cost) 
    train_step = AdamOptimizer().minimize(cost) 

    return train_step 

def predictions_op(y): 
    return tf.cast(tf.argmax(y, axis=len(y.get_shape()) - 1), dtype=tf.int32) 

def accuracy_op(y,gold): 
    return tf.reduce_mean(tf.cast(tf.equal(predictions_op(y), gold),tf.float32)) 

def model(batch_size, num_classes, input_size, scope, reuse): 
    with tf.variable_scope(scope) as m: 
     if reuse: 
      m.reuse_variables() 
     is_training = tf.placeholder(tf.bool) 

     x = tf.placeholder(tf.float32, shape=[batch_size, input_size]) 

     y = ffnn_model(x, num_classes=1, batch_size=batch_size, is_training=is_training, reuse=reuse) 

     g = tf.placeholder(tf.int32, shape=[batch_size, num_classes]) 

     return g, x, y, is_training 

def train(batch_size=100,input_size = 100): 
    scope = "main" 

    g, x, y, is_training = model(batch_size, 1, input_size, scope,reuse=None) 

    with tf.Session() as sess: 
     train_step, accuracy,predictions = train_op(y, g), accuracy_op(y, g), predictions_op(y) 
     cost_op = objective_function(y,g) 
     init_op = tf.group(tf.local_variables_initializer(), tf.global_variables_initializer()) 
     sess.run(init_op) 
     accs = [] 
     accs2 = [] 
     costs = [] 
     for i in range(10000): 
      data = generate_data(batch_size, input_size, noise_strength) 
      _,acc,cost = sess.run([train_step,accuracy,cost_op],feed_dict={x:data.obs,g:data.gold,is_training:True}) 
      acc2 = sess.run(accuracy, feed_dict={x: data.obs, g: data.gold, is_training: False}) 
      accs.append(acc) 
      accs2.append(acc2) 
      costs.append(cost) 
      if i%100 == 0: 
       # print scurrs 
       print i,"with generated means (training = true}",np.mean(accs[-100:]),"with saved means {training = false}",np.mean(accs2[-100:]) 
       # print sess.run(predictions, feed_dict={x: data.obs, g: data.gold, is_training: False}) 
       vars = [var for var in tf.global_variables() if 'moving' in var.name] 

       rv = sess.run(vars, {is_training: False}) 
       rt = sess.run(vars, {is_training: True}) 

       print"\t".join([str((v.name, a)) for a, v in zip(rv, vars)]), \ 
        "\n", \ 
        "\t".join([str((v.name, a)) for a, v in zip(rt, vars)]) 


if __name__ == "__main__": 
    train() 
+0

如果您使用版本1.0或更高版本,你可能有興趣轉換到'tf.layers.batch_normalization'而比'tf.contrib.layers'版本要好。據我所知它在功能上是相同的(雖然有不同的參數名稱),但與contrib相比,核心層API更穩定。 – DomJack

+0

更新:剛剛實現批量規範的兩個版本之間的默認參數已更改。小心移植以確保模型之間的一致性 - 這不僅僅是名稱更改。 – DomJack

回答

0

批標準化創建,你必須以更新的值運行操作。也就是說,它也將它們添加到特定的集合中,並且如果使用tf.contrib.layers.optimize_loss函數,它會爲您收集這些函數並在運行此op時運行它們。

所以解決,更換:

train_step = AdamOptimizer().minimize(cost) 

train_step = optimize_loss(loss, step, learning_rate, optimizer='ADAM')