4

我一直在嘗試在python中實現一個基本的反傳播神經網絡,並完成了初始化和訓練權重集的編程。然而,在我訓練的所有集中,誤差(均方)總會收斂到一個奇怪的數 - 在進一步的迭代中誤差總是減小,但從未真正接近於零。
任何幫助將不勝感激。神經網絡訓練平臺梯度下降

import csv 
import numpy as np 

class NeuralNetwork: 
layers = 0 
shape = None 
weights = [] 

layerIn = [] 
layerOut = [] 

def __init__(self, shape): 
    self.shape = shape 
    self.layers = len(shape) - 1 

    for i in range(0,self.layers): 
     n = shape[i] 
     m = shape[i+1] 
     self.weights.append(np.random.normal(scale=0.2, size = (m,n+1))) 

def sgm(self, x): 
    return 1/(1+np.exp(-x)) 

def dersgm(self, x): 
    y = self.sgm(x) 
    return y*(y-1) 


def run(self, input): 
    self.layerIn = [] 
    self.layerOut = [] 

    for i in range(self.layers): 
     if i == 0: 
      layer = self.weights[0].dot(np.vstack((input.transpose(), np.ones([1,input.shape[0]])))) 
     else: 
      layer = self.weights[i].dot(np.vstack((self.layerOut[-1], np.ones([1,input.shape[0]])))) 
     self.layerIn.append(layer) 
     self.layerOut.append(self.sgm(layer)) 

    return self.layerOut[-1].T 

def backpropogate(self, input, y, learning_rate): 
    deltas = [] 
    y_hat = self.run(input) 

    #Calculate deltas 
    for i in reversed(range(self.layers)): 

     #for last layer 
     if i == self.layers-1: 
      error = y_hat - y 
      msq_error = sum(.5 * ((error) ** 2)) 
      #returns delta, k rows for k inputs, m columns for m nodes 
      deltas.append(error * self.dersgm(y_hat)) 
     else: 

      error = deltas[-1].dot(self.weights[i+1][:,:-1]) 
      deltas.append(self.dersgm(self.layerOut[i]).T * error) 

    #Calculate weight-deltas 
    wdelta = [] 
    ordered_deltas = list(reversed(deltas)) #reverse order because created backwards 

    #returns weight deltas, k rows for k nodes, m columns for m next layer nodes 
    for i in range(self.layers): 
     if i == 0: 
      #add bias 
      input_with_bias = np.vstack((input.T, np.ones(input.shape[0]))) 
      #some over n rows of deltas for n training examples to get one delta for all examples 
      #for all nodes 
      wdelta.append(ordered_deltas[i].T.dot(input_with_bias.T)) 
     else: 
      with_bias = np.vstack((self.layerOut[i-1], np.ones(input.shape[0]))) 
      wdelta.append(ordered_deltas[i].T.dot(with_bias.T)) 



    #update_weights 
    def update_weights(self, weight_deltas, learning_rate): 
     for i in range(self.layers): 
      self.weights[i] = self.weights[i] +\ 
           (learning_rate * weight_deltas[i]) 


    update_weights(self, wdelta, learning_rate) 

    return msq_error 

    #end backpropogate 

def train(self, input, target, lr, run_iter): 
    for i in range(run_iter): 
     if i % 100000 == 0: 
      print self.backpropogate(input, target, lr) 
+0

應該如何看的輸入和目標(形狀)? – matousc

+0

輸入是一個4x2矩陣,目標是一個4x1矩陣(列向量) –

+0

漸變下降是關於在輸入上縮放和步長不合適的問題......你檢查過這些東西嗎?你有測試上面的代碼聲稱計算梯度是否正確等? –

回答

3

以下情況下的誤差函數不能爲0,因爲誤差函數爲0會要求點完美匹配曲線。

Data fitting

爲了有更多的神經元將肯定減少錯誤,因爲該函數可以有一個更加複雜和精確的形狀。但是,當你適合你的數據時,會出現一個稱爲過度擬合的問題,如下圖所示。從左到右,曲線或者不適合數據集,幾乎正確地擬合它,然後在右側過度擬合。

underfitting vs overfitting

在合適的情況下會導致誤差爲0,但這是不希望的,並且要避免這一點。怎麼樣?

確定網絡中的神經元數量是否理想(具有良好擬合)的最簡單方法是通過反覆試驗。將您的數據分成訓練數據(80% - 訓練網絡)和測試數據(20% - 僅保留一次以測試訓練後的網絡)。雖然只對訓練數據進行訓練,但可以在測試數據集上繪製性能圖。

您也可以用於驗證一個數據集3,請參閱: whats is the difference between train, validation and test set, in neural networks?