2017-09-27 137 views
1

鑑於輸入值[1, 5]和標準化他們應該產生類似[-1, 1]if I understand correctly,因爲BatchNormalization層給出意想不到的輸出值

mean = 3 
var = 4 
result = (x - mean)/sqrt(var) 

然而這個最小示例

import numpy as np 

import keras 
from keras.models import Model 
from keras.layers import Input 
from keras.layers.normalization import BatchNormalization 
from keras import backend as K 

shape = (1,2,1) 
input = Input(shape=shape) 
x = BatchNormalization(center=False)(input) # no beta 
model = Model(inputs=input, outputs=x) 
model.compile(loss='mse', optimizer='sgd') 

# training with dummy data 
training_in = [np.random.random(size=(10, *shape))] 
training_out = [np.random.random(size=(10, *shape))] 
model.fit(training_in, training_out, epochs=10) 

data_in = np.array([[[[1], [5]]]], dtype=np.float32) 
data_out = model.predict(data_in) 

print('gamma :', K.eval(model.layers[1].gamma)) 
#print('beta :', K.eval(model.layers[1].beta)) 
print('moving_mean:', K.eval(model.layers[1].moving_mean)) 
print('moving_variance:', K.eval(model.layers[1].moving_variance)) 

print('epsilon :', model.layers[1].epsilon) 
print('data_in :', data_in) 
print('data_out:', data_out) 

產生以下輸出:

gamma : [ 0.80644524] 
moving_mean: [ 0.05885344] 
moving_variance: [ 0.91000736] 
epsilon : 0.001 
data_in : [[[[ 1.] 
    [ 5.]]]] 
data_out: [[[[ 0.79519051] 
    [ 4.17485714]]]] 

因此它是[0.79519051, 4.17485714]而不是[-1, 1]

我看了看source,這些值似乎被轉發到tf.nn.batch_normalization。而這個looks的結果應該是我除外,但顯然不是。

那麼如何計算輸出值?

回答

1

如果您使用gamma,正確的公式其實result = gamma * (x - mean)/sqrt(var)是批標準化,meanvar並不總是相同的:

  • 在培訓過程中(FIT),他們是mean_batchvar_batch使用批次的輸入值計算(它們只是批次的平均值和方差)),就像您在做的一樣。與此同時,全球的moving_meanmoving_variance這樣學習:moving_mean = alpha * moving_mean + (1-alpha) * mean_batch,alpha是一種學習率,在(0,1)中,通常在0.9以上。 moving_meanmoving_variance是您所有的培訓數據的實際平均值和方差的近似值。 Gamma也通過常用的漸變下降來學習,以最好地適合您的輸出。

  • 在推斷(預測)期間,您只需使用學習值moving_meanmoving_variance,根本不使用mean_batchvar_batch。你也使用學到的gamma

所以0.05885344只是你隨機輸入數據,其方差0.91000736的平均的逼近,而你使用這些規範化的新數據[1,5]。你可以很容易地檢查,[0.79519051, 4.17485714]=gamma * ([1, 5] - moving_mean)/sqrt(moving_var)

編輯:alpha被稱爲keras動力,如果你想檢查它。

+0

真棒,非常感謝。當'center = True'時,你能否告訴我beta在公式中的位置?我的猜測是'output = gamma *(input - moving_mean)/ sqrt(moving_variance)+ beta',但是當我[啓用中心](https://ideone.com/TNRlFH)[output](https:// ideone.com/d3rROd)不匹配。 –

+1

應該如您所說根據tensorflow給出[this](https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization)。 [Keras](https://keras.io/layers/normalization/)看起來好像它可以在他們的文檔中執行'output = gamma *((input - moving_mean)/ sqrt(moving_variance)+ beta)'。然而,沒有一個完全符合,我不知道爲什麼... – gdelab

+0

我發現這個問題。我們沒有使用epsilon。完全正確的公式是'result = gamma *(input - moving_mean)/ sqrt(moving_variance + epsilon)+ beta'。 –

0

正確的公式是這樣的:

result = gamma * (input - moving_mean)/sqrt(moving_variance + epsilon) + beta 

這裏進行驗證的腳本:

import math 
import numpy as np 
import tensorflow as tf 
from keras import backend as K 

from keras.models import Model 
from keras.layers import Input 
from keras.layers.normalization import BatchNormalization 

np.random.seed(0) 

print('=== keras model ===') 
input_shape = (1,2,1) 
input = Input(shape=input_shape) 
x = BatchNormalization()(input) 
model = Model(inputs=input, outputs=x) 
model.compile(loss='mse', optimizer='sgd') 
training_in = [np.random.random(size=(10, *input_shape))] 
training_out = [np.random.random(size=(10, *input_shape))] 
model.fit(training_in, training_out, epochs=100, verbose=0) 
data_in = [[[1.0], [5.0]]] 
data_model = np.array([data_in]) 
result = model.predict(data_model) 
gamma = K.eval(model.layers[1].gamma) 
beta = K.eval(model.layers[1].beta) 
moving_mean = K.eval(model.layers[1].moving_mean) 
moving_variance = K.eval(model.layers[1].moving_variance) 
epsilon = model.layers[1].epsilon 
print('gamma:   ', gamma) 
print('beta:   ', beta) 
print('moving_mean: ', moving_mean) 
print('moving_variance:', moving_variance) 
print('epsilon:  ', epsilon) 
print('data_in:  ', data_in) 
print('result:   ', result) 

print('=== numpy ===') 
np_data = [data_in[0][0][0], data_in[0][1][0]] 
np_mean = moving_mean[0] 
np_variance = moving_variance[0] 
np_offset = beta[0] 
np_scale = gamma[0] 
np_result = [np_scale * (x - np_mean)/math.sqrt(np_variance + epsilon) + np_offset for x in np_data] 
print(np_result) 

print('=== tensorflow ===') 
tf_data = tf.constant(data_in) 
tf_mean = tf.constant(moving_mean) 
tf_variance = tf.constant(moving_variance) 
tf_offset = tf.constant(beta) 
tf_scale = tf.constant(gamma) 
tf_variance_epsilon = epsilon 
tf_result = tf.nn.batch_normalization(tf_data, tf_mean, tf_variance, tf_offset, tf_scale, tf_variance_epsilon) 
tf_sess = tf.Session() 
print(tf_sess.run(tf_result)) 

print('=== keras backend ===') 
k_data = K.constant(data_in) 
k_mean = K.constant(moving_mean) 
k_variance = K.constant(moving_variance) 
k_offset = K.constant(beta) 
k_scale = K.constant(gamma) 
k_variance_epsilon = epsilon 
k_result = K.batch_normalization(k_data, k_mean, k_variance, k_offset, k_scale, k_variance_epsilon) 
print(K.eval(k_result)) 

輸出:

gamma:   [ 0.22297101] 
beta:   [ 0.49253803] 
moving_mean:  [ 0.36868709] 
moving_variance: [ 0.41429576] 
epsilon:   0.001 
data_in:   [[[1.0], [5.0]]] 
result:   [[[[ 0.71096909] 
    [ 2.09494853]]]] 

=== numpy === 
[0.71096905498374263, 2.0949484904433255] 

=== tensorflow === 
[[[ 0.71096909] 
    [ 2.09494853]]] 

=== keras backend === 
[[[ 0.71096909] 
    [ 2.09494853]]]