在小型批量更新

執行L1正規化我目前正在讀Neural Networks and Deep Learning，我被困在一個問題。問題是更新他給出的使用L1正則化而不是L2正則化的代碼。在小型批量更新

原片的使用L2正規化代碼是：

def update_mini_batch(self, mini_batch, eta, lmbda, n): 
    """Update the network's weights and biases by applying gradient 
    descent using backpropagation to a single mini batch. The 
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the 
    learning rate, ``lmbda`` is the regularization parameter, and 
    ``n`` is the total size of the training data set. 

    """ 
    nabla_b = [np.zeros(b.shape) for b in self.biases] 
    nabla_w = [np.zeros(w.shape) for w in self.weights] 
    for x, y in mini_batch: 
     delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
     nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
     nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
    self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw 
        for w, nw in zip(self.weights, nabla_w)] 
    self.biases = [b-(eta/len(mini_batch))*nb 
        for b, nb in zip(self.biases, nabla_b)]

，其中可以看出，self.weights使用L2正則化項更新。對於L1正規化，我相信，我只需要更新同一行，以反映

它在書中指出，我們可以使用小型估計

項平均批次。這對我來說是一個令人困惑的陳述，但我認爲這意味着每個小批量的平均使用各層的nabla_w。這導致我對代碼進行了以下編輯：

def update_mini_batch(self, mini_batch, eta, lmbda, n): 
    """Update the network's weights and biases by applying gradient 
    descent using backpropagation to a single mini batch. The 
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the 
    learning rate, ``lmbda`` is the regularization parameter, and 
    ``n`` is the total size of the training data set. 

    """ 
    nabla_b = [np.zeros(b.shape) for b in self.biases] 
    nabla_w = [np.zeros(w.shape) for w in self.weights] 
    for x, y in mini_batch: 
     delta_nabla_b, delta_nabla_w = self.backprop(x, y) 
     nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 
     nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 
    avg_nw = [np.array([[np.average(layer)] * len(layer[0])] * len(layer)) 
       for layer in nabla_w] 
    self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw 
        for w, nw in zip(self.weights, avg_nw)] 
    self.biases = [b-(eta/len(mini_batch))*nb 
        for b, nb in zip(self.biases, nabla_b)]

但我得到的結果幾乎只是噪聲，精度約爲10％。我是否解釋錯誤的陳述或我的代碼錯誤？任何提示將不勝感激。

來源

2017-06-19 A. Wong

這是不正確的。

概念L2正則說，我們每次訓練迭代後要幾何比例西下降了一些腐爛。這樣，如果W變得非常大，它將縮小更多。這使得W中的各個值不會變得太大。

概念L1正則說，我們每次訓練迭代後要線性減少W遞減一些常數（不要越過零。正數減少到零，但沒有下文。負數上升到零但不高於）。這使得W的非常小的值爲零，僅剩下值很大的值。

你的第二個公式

self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw 
       for w, nw in zip(self.weights, avg_nw)]

沒有實現原料減法，但仍然有乘法（等比縮放）在（1-ETA *（lmbda/N））*寬。

實現一些功能reduceLinearlyToZero該取w和ETA *（lmbda/n）和返回的max（ABS（瓦特 - ETA *（lmbda/N）），0）*（1.0當w> = 0，否則， - 1.0）

def reduceLinearlyToZero(w,eta,lmbda,n) : 
    return max(abs(w - eta*(lmbda/n)) , 0) * (1.0 if w >= 0 else -1.0) 


self.weights = [ reduceLinearlyToZero(w,eta,lmbda,n)-(eta/len(mini_batch))*nw 
       for w, nw in zip(self.weights, avg_nw)]

或可能

self.weights = [ reduceLinearlyToZero(w-(eta/len(mini_batch))*nw,eta,lmbda,n) 
       for w, nw in zip(self.weights, avg_nw)]

來源

2017-06-20 02:32:57 Wontonimo

這是非常非常有幫助。我發現L1和L2正則化的概念性描述是睜眼。謝謝！ –

在小型批量更新

回答

相關問題