這是我的代碼。我試圖建立一個VGG 11層網絡,結合ReLu和ELu激活以及內核和活動的許多正則化。結果令人困惑:代碼是在第10個時代。我在列車和val方面的損失已經從2000年下降到1.5,但我在列車和val方面的表現仍然保持在50%。有人可以向我解釋嗎?Kera與Theano:損失減少但準確性不變
# VGG 11
from keras.regularizers import l2
from keras.layers.advanced_activations import ELU
from keras.optimizers import Adam
model = Sequential()
model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
input_shape=(1, 96, 96), activation='relu'))
model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001),activity_regularizer=l2(0.0001),
activation='relu'))
model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# convert convolutional filters to flat so they can be feed to fully connected layers
model.add(Flatten())
model.add(Dense(2048, kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))
model.add(Dense(1024, kernel_initializer='he_normal',
kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))
model.add(Dense(2))
model.add(Activation('softmax'))
adammo = Adam(lr=0.0008, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=adammo, metrics=['accuracy'])
hist = model.fit(X_train, y_train, batch_size=48, epochs=20, verbose=1, validation_data=(X_val, y_val))
您正在使用太多正規化 – Nain
謝謝你恩。你能解釋爲什麼acc沒有增加的理論原因嗎?我知道太多正則化會確保將損失降到最低。 – Estellad
@Estellad添加評論爲什麼你投了我給的答案。僅僅因爲你對這個網絡有理論上的偏好,你的初始化,你的ELU,你任意選擇的激活函數並不意味着它是正確的。這很多都不常見。這就是爲什麼我提出了一個完全不同的結構。 – modesitt