在Keras

不能複製matconvnet CNN架構

我有一個卷積神經網絡在matconvnet以下架構，我用我自己的數據訓練：在Keras

function net = cnn_mnist_init(varargin) 
% CNN_MNIST_LENET Initialize a CNN similar for MNIST 
opts.batchNormalization = false ; 
opts.networkType = 'simplenn' ; 
opts = vl_argparse(opts, varargin) ; 

f= 0.0125 ; 
net.layers = {} ; 
net.layers{end+1} = struct('name','conv1',... 
          'type', 'conv', ... 
          'weights', {{f*randn(3,3,1,64, 'single'), zeros(1, 64, 'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','pool1',... 
          'type', 'pool', ... 
          'method', 'max', ... 
          'pool', [3 3], ... 
          'stride', 1, ... 
          'pad', 0); 
net.layers{end+1} = struct('name','conv2',... 
          'type', 'conv', ... 
          'weights', {{f*randn(5,5,64,128, 'single'),zeros(1,128,'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','pool2',... 
          'type', 'pool', ... 
          'method', 'max', ... 
          'pool', [2 2], ... 
          'stride', 2, ... 
          'pad', 0) ; 
net.layers{end+1} = struct('name','conv3',... 
          'type', 'conv', ... 
          'weights', {{f*randn(3,3,128,256, 'single'),zeros(1,256,'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','pool3',... 
          'type', 'pool', ... 
          'method', 'max', ... 
          'pool', [3 3], ... 
          'stride', 1, ... 
          'pad', 0) ; 
net.layers{end+1} = struct('name','conv4',... 
          'type', 'conv', ... 
          'weights', {{f*randn(5,5,256,512, 'single'),zeros(1,512,'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','pool4',... 
          'type', 'pool', ... 
          'method', 'max', ... 
          'pool', [2 2], ... 
          'stride', 1, ... 
          'pad', 0) ; 
net.layers{end+1} = struct('name','ip1',... 
          'type', 'conv', ... 
          'weights', {{f*randn(1,1,256,256, 'single'), zeros(1,256,'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','relu',... 
          'type', 'relu'); 
net.layers{end+1} = struct('name','classifier',... 
          'type', 'conv', ... 
          'weights', {{f*randn(1,1,256,2, 'single'), zeros(1,2,'single')}}, ... 
          'stride', 1, ... 
          'pad', 0,... 
          'learningRate', [1 2]) ; 
net.layers{end+1} = struct('name','loss',... 
          'type', 'softmaxloss') ; 

% optionally switch to batch normalization 
if opts.batchNormalization 
    net = insertBnorm(net, 1) ; 
    net = insertBnorm(net, 4) ; 
    net = insertBnorm(net, 7) ; 
    net = insertBnorm(net, 10) ; 
    net = insertBnorm(net, 13) ; 
end 

% Meta parameters 
net.meta.inputSize = [28 28 1] ; 
net.meta.trainOpts.learningRate = [0.01*ones(1,10) 0.001*ones(1,10) 0.0001*ones(1,10)]; 
disp(net.meta.trainOpts.learningRate); 
pause; 
net.meta.trainOpts.numEpochs = length(net.meta.trainOpts.learningRate) ; 
net.meta.trainOpts.batchSize = 256 ; 
net.meta.trainOpts.momentum = 0.9 ; 
net.meta.trainOpts.weightDecay = 0.0005 ; 

% -------------------------------------------------------------------- 
function net = insertBnorm(net, l) 
% -------------------------------------------------------------------- 
assert(isfield(net.layers{l}, 'weights')); 
ndim = size(net.layers{l}.weights{1}, 4); 
layer = struct('type', 'bnorm', ... 
       'weights', {{ones(ndim, 1, 'single'), zeros(ndim, 1, 'single')}}, ... 
       'learningRate', [1 1], ... 
       'weightDecay', [0 0]) ; 
net.layers{l}.biases = [] ; 
net.layers = horzcat(net.layers(1:l), layer, net.layers(l+1:end)) ;

我想要做的就是建立在keras相同的架構，這是我試過到目前爲止：

model = Sequential() 

model.add(Conv2D(64, (3, 3), strides=1, input_shape=input_shape)) 
model.add(MaxPooling2D(pool_size=(3, 3), strides=1)) 

model.add(Conv2D(128, (5, 5), strides=1)) 
model.add(MaxPooling2D(pool_size=(2, 2), strides=2)) 

model.add(Conv2D(256, (3, 3), strides=1)) 
model.add(MaxPooling2D(pool_size=(3, 3), strides=1)) 

model.add(Conv2D(512, (5, 5), strides=1)) 
model.add(MaxPooling2D(pool_size=(2, 2), strides=1)) 

model.add(Conv2D(256, (1, 1))) 
convout1=Activation('relu') 
model.add(convout1) 

model.add(Flatten()) 
model.add(Dense(num_classes, activation='softmax')) 

opt = keras.optimizers.rmsprop(lr=0.0001, decay=0.0005) 
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['binary_accuracy'])

然而，當我運行matconvnet網絡我有87％的準確率，如果我運行keras版本我有77％的準確率。如果他們應該是相同的網絡，並且數據是相同的，那麼區別在哪裏？我的Keras架構出了什麼問題？

來源

2017-07-01 mad

您的'mathconv'網絡是否打開了「BatchNormalization」選項？因爲你還沒有添加BatchNormalization。 –

沒有批次標準化。謝謝！ – mad

所以 - 我可以形成答案 - 所以你可以接受它，並使其更加明顯？ –

在您的MatConvNet版本中，您使用SGD的勢頭。

在Keras，您使用rmsprop

用不同的學習規則，你應該嘗試不同的學習率。有時候，在訓練CNN時，勢頭也會有所幫助。

您可以嘗試凱拉斯的SGD +動力，讓我知道會發生什麼？

另一件可能不同的事情是初始化。例如在MatConvNet中，使用f = 0.0125的高斯初始值作爲標準偏差。在凱拉斯我不確定默認的初始化。

一般而言，如果您不使用批規範化，網絡很容易出現許多數值問題。如果你在這兩個網絡中使用批量標準化，我敢打賭結果是相似的。是否有任何理由不想使用批規範化？

來源

2017-07-03 22:14:07 DataHungry

我會的，謝謝！ – mad

我試過這個sgd = keras.optimizers.SGD（lr = 0.0001，decay = 0.0005，動量= 0.9），但結果變得更糟：50％。你相信兩個網絡的架構是一樣的嗎？ – mad

@mad參見編輯回覆 – DataHungry

回答

相關問題