卷積神經網絡不收斂

我一直在看深層學習/卷積神經網絡上的一些視頻，如here和here，我試圖用C++實現我自己的視頻。我試圖保持輸入數據非常簡單，因爲我的第一次嘗試是想區分十字和圓，我有一個每個大約25（64 * 64圖像）的小數據集，它們看起來像這樣：卷積神經網絡不收斂

網絡本身是五層：

Convolution (5 filters, size 3, stride 1, with a ReLU) 
MaxPool (size 2) 
Convolution (1 filter, size 3, stride 1, with a ReLU) 
MaxPool (size 2) 
Linear Regression classifier

我的問題是，我的網絡不收斂，任何東西。權重似乎沒有改變。如果我運行它，那麼預測大部分保持不變，而不是在返回下一次迭代之前跳出的偶爾異常值。

卷積層的訓練看起來是這樣的，去掉了一些循環，使之清潔

// Yeah, I know I should change the shared_ptr<float> 
void ConvolutionalNetwork::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label) 
{ 
    float biasGradient = 0.0f; 

    // Calculate the deltas with respect to the input. 
    for (int layer = 0; layer < m_Filters.size(); ++layer) 
    { 
     // Pseudo-code, each loop on it's own line in actual code 
     For z < depth, x <width - filterSize, y < height -filterSize 
     {    
      int newImageIndex = layer*m_OutputWidth*m_OutputHeight+y*m_OutputWidth + x; 

      For the bounds of the filter (U,V) 
      { 
       // Find the index in the input image 
       int imageIndex = x + (y+v)*m_OutputWidth + z*m_OutputHeight*m_OutputWidth; 
       int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize; 
       m_pGradients.get()[imageIndex] += outputGradients.get()[newImageIndex]*input.get()[imageIndex]; 
       m_GradientSum[layer].get()[kernelIndex] += m_pGradients.get()[imageIndex] * m_Filters[layer].get()[kernelIndex]; 

       biasGradient += m_GradientSum[layer].get()[kernelIndex]; 
      }  
     } 
    } 

    // Update the weights 
    for (int layer = 0; layer < m_Filters.size(); ++layer) 
    { 
     For z < depth, U & V < filtersize 
     { 
      // Find the index in the input image 
      int kernelIndex = u +v*m_FilterSize + z*m_FilterSize*m_FilterSize; 
      m_Filters[layer].get()[kernelIndex] -= learningRate*m_GradientSum[layer].get()[kernelIndex]; 
     } 
     m_pBiases.get()[layer] -= learningRate*biasGradient; 
    } 
}

所以，我創建了一個緩衝（m_pGradients），這是輸入緩衝喂梯度的尺寸回上一層，但使用梯度和來調整權重。

最大池計算梯度回像這樣（這樣可以節省最高指數和零所有其他梯度出）

void MaxPooling::Train(std::shared_ptr<float> input,std::shared_ptr<float> outputGradients, float label) 
{ 
    for (int outputVolumeIndex = 0; outputVolumeIndex <m_OutputVolumeSize; ++outputVolumeIndex) 
    { 
     int inputIndex = m_Indices.get()[outputVolumeIndex]; 
     m_pGradients.get()[inputIndex] = outputGradients.get()[outputVolumeIndex]; 
    } 
}

，最終迴歸層計算其梯度是這樣的：

void LinearClassifier::Train(std::shared_ptr<float> data,std::shared_ptr<float> output, float y) 
{ 
    float * x = data.get(); 

    float biasError = 0.0f; 
    float h = Hypothesis(output) - y; 

    for (int i =1; i < m_NumberOfWeights; ++i) 
    { 
     float error = h*x[i]; 
     m_pGradients.get()[i] = error; 
     biasError += error; 
    } 

    float cost = h; 
    m_Error = cost*cost; 

    for (int theta = 1; theta < m_NumberOfWeights; ++theta) 
    { 
     m_pWeights.get()[theta] = m_pWeights.get()[theta] - learningRate*m_pGradients.get()[theta]; 
    } 

    m_pWeights.get()[0] -= learningRate*biasError; 
}

對這兩個例子進行100次迭代訓練後，每個訓練的預測與其他訓練的預測相同，並且從一開始就不變。

像這樣的卷積網絡應該能夠區分這兩個類嗎？
這是正確的方法嗎？
我應該計算卷積層反向傳播中的ReLU（max）嗎？

來源

2016-02-04 Davors72

如果這樣的卷積網絡能夠在兩個類之間進行區分？

是的。事實上，甚至線性分類器本身也應該能夠非常容易地進行區分（如果圖像集中或多或少）。

這是正確的方法嗎？

最可能的原因是您的梯度公式中出現錯誤。始終遵循2條簡單規則：

從開始基本型號。不要從2-conv網絡開始。不用任何卷積開始您的代碼。它現在工作嗎？當您有1個線性圖層時，請添加單個卷積。它現在工作嗎？等等。
始終檢查你的漸變數值。這非常簡單，並且可以節省數小時的調試時間！從分析回想
```
[grad f(x) ]_i ~ (f(x+eps*e_i) - f(x-eps*e_i))/2*eps 
```
由[] _i我的意思是第i次座標位置，並通過e_i我的意思是第i次規範矢量（零矢量與一個上的第i座標）

我應該佔卷積層反向傳播的RELU（最大）？

是的，ReLU會改變你的梯度，因爲這是一個你需要區分的非線性。再次回到第1點。從簡單模型開始，分別添加每個元素以找出哪一個會導致您的漸變/模型崩潰。

來源

2016-02-04 01:12:20 lejlot

謝謝！我會盡力回覆你。圖像不居中，顏色不同，等等，所以我認爲線性分類器在整個測試集上都會失敗。但我會試試這兩個。 – Davors72

卷積神經網絡不收斂

回答

相關問題