0

執行L1正規化我目前正在讀Neural Networks and Deep Learning,我被困在一個問題。問題是更新他給出的​​使用L1正則化而不是L2正則化的代碼。在小型批量更新

原片的使用L2正規化代碼是:

def update_mini_batch(self, mini_batch, eta, lmbda, n): 
    """Update the network's weights and biases by applying gradient 
    descent using backpropagation to a single mini batch. The 
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the 
    learning rate, ``lmbda`` is the regularization parameter, and 
    ``n`` is the total size of the training data set. 

    """ 
    nabla_b = [np.zeros(b.shape) for b in self.biases] 
    nabla_w = [np.zeros(w.shape) for w in self.weights] 
    for x, y in mini_batch: 
     delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
     nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
     nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
    self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw 
        for w, nw in zip(self.weights, nabla_w)] 
    self.biases = [b-(eta/len(mini_batch))*nb 
        for b, nb in zip(self.biases, nabla_b)] 

,其中可以看出,self.weights使用L2正則化項更新。對於L1正規化,我相信,我只需要更新同一行,以反映

weight update for l1 regularization

它在書中指出,我們可以使用小型估計

enter image description here

項平均批次。這對我來說是一個令人困惑的陳述,但我認爲這意味着每個小批量的平均使用各層的nabla_w。這導致我對代碼進行了以下編輯:

def update_mini_batch(self, mini_batch, eta, lmbda, n): 
    """Update the network's weights and biases by applying gradient 
    descent using backpropagation to a single mini batch. The 
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the 
    learning rate, ``lmbda`` is the regularization parameter, and 
    ``n`` is the total size of the training data set. 

    """ 
    nabla_b = [np.zeros(b.shape) for b in self.biases] 
    nabla_w = [np.zeros(w.shape) for w in self.weights] 
    for x, y in mini_batch: 
     delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
     nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
     nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
    avg_nw = [np.array([[np.average(layer)] * len(layer[0])] * len(layer)) 
       for layer in nabla_w] 
    self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw 
        for w, nw in zip(self.weights, avg_nw)] 
    self.biases = [b-(eta/len(mini_batch))*nb 
        for b, nb in zip(self.biases, nabla_b)] 

但我得到的結果幾乎只是噪聲,精度約爲10%。我是否解釋錯誤的陳述或我的代碼錯誤?任何提示將不勝感激。

回答

1

這是不正確的。

概念L2正則說,我們每次訓練迭代後要幾何比例西下降了一些腐爛。這樣,如果W變得非常大,它將縮小更多。這使得W中的各個值不會變得太大。

概念L1正則說,我們每次訓練迭代後要線性減少W遞減一些常數(不要越過零。正數減少到零,但沒有下文。負數上升到零但不高於)。這使得W的非常小的值爲零,僅剩下值很大的值。

你的第二個公式

self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw 
       for w, nw in zip(self.weights, avg_nw)] 

沒有實現原料減法,但仍然有乘法(等比縮放)在(1-ETA *(lmbda/N))*寬

實現一些功能reduceLinearlyToZero該取w和ETA *(lmbda/n)和返回的max(ABS(瓦特 - ETA *(lmbda/N)),0)*(1.0當w> = 0,否則, - 1.0)

def reduceLinearlyToZero(w,eta,lmbda,n) : 
    return max(abs(w - eta*(lmbda/n)) , 0) * (1.0 if w >= 0 else -1.0) 


self.weights = [ reduceLinearlyToZero(w,eta,lmbda,n)-(eta/len(mini_batch))*nw 
       for w, nw in zip(self.weights, avg_nw)] 

或可能

self.weights = [ reduceLinearlyToZero(w-(eta/len(mini_batch))*nw,eta,lmbda,n) 
       for w, nw in zip(self.weights, avg_nw)] 
+0

這是非常非常有幫助。我發現L1和L2正則化的概念性描述是睜眼。謝謝! –