-1

所有梯度在朱古力

我遇到問題時,我用來自Caffe批標準化使用批標準化時消失。 這是我在train_val.prototxt中使用的代碼。

layer { 
     name: "conv1" 
     type: "Convolution" 
     bottom: "conv0" 
     top: "conv1" 
     param { 
     lr_mult: 1 
     decay_mult: 1 
     } 
     param { 
     lr_mult: 0 
     decay_mult: 0 
     } 
     convolution_param { 
     num_output: 32 
     pad: 1 
     kernel_size: 3 
     weight_filler { 
      type: "gaussian" 
      std: 0.0589 
     } 
     bias_filler { 
      type: "constant" 
      value: 0 
     } 
     engine: CUDNN 
     } 
    } 
    layer { 
     name: "bnorm1" 
     type: "BatchNorm" 
     bottom: "conv1" 
     top: "conv1" 
     batch_norm_param { 
     use_global_stats: false 
     } 
    } 
    layer { 
     name: "scale1" 
     type: "Scale" 
     bottom: "conv1" 
     top: "conv1" 
     scale_param { 
     bias_term: true 
     } 
    } 
    layer { 
     name: "relu1" 
     type: "ReLU" 
     bottom: "conv1" 
     top: "conv1" 
    } 

layer { 
    name: "conv16" 
    type: "Convolution" 
    bottom: "conv1" 
    top: "conv16" 
    param { 
    lr_mult: 1 
    decay_mult: 1 
    } 

但是,培訓沒有收斂。通過去除BN層(蝙蝠科技+規模),訓練可以收斂。所以我開始比較具有或不具有BN層的日誌文件。下面是與DEBUG_INFO日誌文件= TRUE:

隨着BN:

I0804 10:22:42.074671 8318 net.cpp:638]  [Forward] Layer loadtestdata, top blob data data: 0.368457 
I0804 10:22:42.074757 8318 net.cpp:638]  [Forward] Layer loadtestdata, top blob label data: 0.514496 
I0804 10:22:42.076117 8318 net.cpp:638]  [Forward] Layer conv0, top blob conv0 data: 0.115678 
I0804 10:22:42.076200 8318 net.cpp:650]  [Forward] Layer conv0, param blob 0 data: 0.0455077 
I0804 10:22:42.076273 8318 net.cpp:650]  [Forward] Layer conv0, param blob 1 data: 0 
I0804 10:22:42.076539 8318 net.cpp:638]  [Forward] Layer relu0, top blob conv0 data: 0.0446758 
I0804 10:22:42.078435 8318 net.cpp:638]  [Forward] Layer conv1, top blob conv1 data: 0.0675479 
I0804 10:22:42.078516 8318 net.cpp:650]  [Forward] Layer conv1, param blob 0 data: 0.0470226 
I0804 10:22:42.078589 8318 net.cpp:650]  [Forward] Layer conv1, param blob 1 data: 0 
I0804 10:22:42.079108 8318 net.cpp:638]  [Forward] Layer bnorm1, top blob conv1 data: 0 
I0804 10:22:42.079197 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 0 data: 0 
I0804 10:22:42.079270 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 1 data: 0 
I0804 10:22:42.079350 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 2 data: 0 
I0804 10:22:42.079421 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 3 data: 0 
I0804 10:22:42.079505 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 4 data: 0 
I0804 10:22:42.080267 8318 net.cpp:638]  [Forward] Layer scale1, top blob conv1 data: 0 
I0804 10:22:42.080345 8318 net.cpp:650]  [Forward] Layer scale1, param blob 0 data: 1 
I0804 10:22:42.080418 8318 net.cpp:650]  [Forward] Layer scale1, param blob 1 data: 0 
I0804 10:22:42.080651 8318 net.cpp:638]  [Forward] Layer relu1, top blob conv1 data: 0 
I0804 10:22:42.082074 8318 net.cpp:638]  [Forward] Layer conv16, top blob conv16 data: 0 
I0804 10:22:42.082154 8318 net.cpp:650]  [Forward] Layer conv16, param blob 0 data: 0.0485365 
I0804 10:22:42.082226 8318 net.cpp:650]  [Forward] Layer conv16, param blob 1 data: 0 
I0804 10:22:42.082675 8318 net.cpp:638]  [Forward] Layer loss, top blob loss data: 42.0327 

沒有BN:

I0803 17:01:29.700850 30274 net.cpp:638]  [Forward] Layer loadtestdata, top blob data data: 0.320584 
I0803 17:01:29.700920 30274 net.cpp:638]  [Forward] Layer loadtestdata, top blob label data: 0.236383 
I0803 17:01:29.701556 30274 net.cpp:638]  [Forward] Layer conv0, top blob conv0 data: 0.106141 
I0803 17:01:29.701633 30274 net.cpp:650]  [Forward] Layer conv0, param blob 0 data: 0.0467062 
I0803 17:01:29.701692 30274 net.cpp:650]  [Forward] Layer conv0, param blob 1 data: 0 
I0803 17:01:29.701835 30274 net.cpp:638]  [Forward] Layer relu0, top blob conv0 data: 0.0547961 
I0803 17:01:29.702193 30274 net.cpp:638]  [Forward] Layer conv1, top blob conv1 data: 0.0716117 
I0803 17:01:29.702267 30274 net.cpp:650]  [Forward] Layer conv1, param blob 0 data: 0.0473551 
I0803 17:01:29.702327 30274 net.cpp:650]  [Forward] Layer conv1, param blob 1 data: 0 
I0803 17:01:29.702425 30274 net.cpp:638]  [Forward] Layer relu1, top blob conv1 data: 0.0318472 
I0803 17:01:29.702781 30274 net.cpp:638]  [Forward] Layer conv16, top blob conv16 data: 0.0403702 
I0803 17:01:29.702847 30274 net.cpp:650]  [Forward] Layer conv16, param blob 0 data: 0.0474007 
I0803 17:01:29.702908 30274 net.cpp:650]  [Forward] Layer conv16, param blob 1 data: 0 
I0803 17:01:29.703228 30274 net.cpp:638]  [Forward] Layer loss, top blob loss data: 11.2245 

奇怪的是,在未來,開始batchnorm每一層都給出了0! ! 另外值得一提的是Relu(就地圖層)只有4行,但是batchnorm和scale(應該也是就地圖層)在日誌文件中有6行和3行。你知道有什麼問題。

+0

您使用的是什麼版本的caffe? – Shai

回答

0

我不知道什麼是錯了你"BatchNorm"層,但它是非常奇怪:(!)
根據您的調試日誌,您"BatchNorm"層內部PARAM斑點(0..4 )。綜觀batch_norm_layer.cpp的源代碼應該只有內部PARAM斑點:

this->blobs_.resize(3); 

我建議你確保你正在使用"BatchNorm"執行不bugous。


關於調試日誌,您可以在read here瞭解更多關於如何解釋它。
爲了解決你的問題

「RELU [...]只有4行,但batchnorm和規模[...]有6個和3個行日誌文件」

注意每個圖層都有一行"top blob ... data" - 報告輸出blob的L2範數。
此外,每個圖層都有一個額外的線,用於其每個內部權重。 "ReLU"圖層沒有內部參數,因此沒有此圖層的"param blob [...] data"的打印件。 "Convolution"層有兩個內部參數(內核和偏差),因此額外兩行爲blob 0和blob 1.