SGD - 經過一些迭代後損失開始增加

我試圖實現具有兩個約束的隨機梯度下降，因此無法使用scikit-learn。不幸的是，我已經在不受兩個約束條件的情況下掙扎於常規新元。訓練集上的損失（平方損失）在一些迭代中下降，但在一段時間後開始增加，如圖中所示。這些是我經常使用的功能：SGD - 經過一些迭代後損失開始增加

def loss_prime_simple(w,node,feature,data): 
    x = data[3] 
    y = data[2] 
    x_f = x[node][feature] 
    y_node = y[node] 
    ret = (y_node - w[feature] * x_f) * (-x_f) 
    return ret 

def update_weights(w,data,predecs,children,node, learning_rate): 
    len_features = len(data[3][0]) 
    w_new = np.zeros(len_features) 
    for feature_ in range(len_features): 
     w_new[feature_] = loss_prime_simple(w,node,feature_,data) 
    return w - learning_rate * w_new 

def loss_simple(w,data): 
    y_p = data[2] 
    x = data[3] 
    return ((y_p - np.dot(w,np.array(x).T)) ** 2).sum()

這顯示了兩種不同的學習率（0.001，0.0001）設置的培訓損失http://postimg.org/image/43nbmh8x5/

任何人都可以找到一個錯誤或有建議如何調試這個？感謝

編輯：

由於lejlot指出，這將是很好的數據。這裏是我使用x的數據（單樣本）：http://textuploader.com/5x0f1

Y = 2

這給出了這樣的損失：http://postimg.org/image/o9d97kt9v/

的更新的代碼：

def loss_prime_simple(w,node,feature,data): 
    x = data[3] 
    y = data[2] 
    x_f = x[node][feature] 
    y_node = y[node] 
    return -(y_node - w[feature] * x_f) * x_f 

def update_weights(w,data,predecs,children,node, learning_rate): 
    len_features = len(data[3][0]) 
    w_new = np.zeros(len_features) 
    for feature_ in range(len_features): 
     w_new[feature_] = loss_prime_simple(w,node,feature_,data) 
    return w - learning_rate * w_new 

def loss_simple2(w,data): 
    y_p = data[2] 
    x = data[3] 
    return ((y_p - np.dot(w,np.array(x).T)) ** 2).sum() 

import numpy as np 
X = [#put array from http://textuploader.com/5x0f1 here] 
y = [2] 

data = None, None, y, X 

w = np.random.rand(4096) 

a = [ loss_simple2(w, data) ] 

for _ in range(200): 
    for j in range(X.shape[0]): 
     w = update_weights(w,data,None,None,j, 0.0001) 
     a.append(loss_simple2(w, data)) 

from matplotlib import pyplot as plt 
plt.figure() 
plt.plot(a) 
plt.show()

來源

2015-11-30 TobSta

問題是我用而不是更新了權重

所以這個作品：

def update_weights(w,x,y, learning_rate): 
    inner_product = 0.0  
    for f_ in range(len(x)): 
     inner_product += (w[f_] * x[f_]) 
    dloss = inner_product - y 
    for f_ in range(len(x)): 
     w[f_] += (learning_rate * (-x[f_] * dloss)) 
    return w

來源

2015-12-11 12:43:27 TobSta

可以注意到的主要錯誤是你reshape而不是transpose，比較：

>>> import numpy as np 
>>> X = np.array(range(10)).reshape(2,-1) 
>>> X 
array([[0, 1, 2, 3, 4], 
     [5, 6, 7, 8, 9]]) 
>>> X.reshape(-1, 2) 
array([[0, 1], 
     [2, 3], 
     [4, 5], 
     [6, 7], 
     [8, 9]]) 
>>> X.T 
array([[0, 5], 
     [1, 6], 
     [2, 7], 
     [3, 8], 
     [4, 9]]) 
>>> X.reshape(-1, 2) == X.T 
array([[ True, False], 
     [False, False], 
     [False, False], 
     [False, False], 
     [False, True]], dtype=bool)

看起來壞呼籲總和（陣列）接下來的事情，你應該叫寧array.sum（）

>>> import numpy as np 
>>> x = np.array(range(10)).reshape(2, 5) 
>>> x 
array([[0, 1, 2, 3, 4], 
     [5, 6, 7, 8, 9]]) 
>>> sum(x) 
array([ 5, 7, 9, 11, 13]) 
>>> x.sum() 
45

在此之後，它工作得很好

def loss_prime_simple(w,node,feature,data): 
    x = data[3] 
    y = data[2] 
    x_f = x[node][feature] 
    y_node = y[node] 
    ret = w[feature] 
    return -(y_node - w[feature] * x_f) * x_f 

def update_weights(w,data,predecs,children,node, learning_rate): 
    len_features = len(data[3][0]) 
    w_new = np.zeros(len_features) 
    for feature_ in range(len_features): 
     w_new[feature_] = loss_prime_simple(w,node,feature_,data) 
    return w - learning_rate * w_new 

def loss_simple(w,data): 
    y_p = data[2] 
    x = data[3] 
    return ((y_p - np.dot(w,np.array(x).T)) ** 2).sum() 

import numpy as np 

X = np.random.randn(1000, 3) 
y = np.random.randn(1000) 

data = None, None, y, X 

w = np.array([1,3,3]) 

loss = [loss_simple(w, data)] 

for _ in range(20): 
    for j in range(X.shape[0]): 
     w = update_weights(w, data, None, None, j, 0.001) 
     loss.append(loss_simple(w, data)) 

from matplotlib import pyplot as plt 
plt.figure() 
plt.plot(loss) 
plt.show()

來源

2015-11-30 21:28:46 lejlot

感謝您的建議。我試過了，但沒有改變。編輯它在我的問題 – TobSta

如果問題仍然存在，你必須提供一個最小的工作示例 - 四**全**代碼（不只是很少的方法，也許你不正確地運行它們）和它失敗的數據 – lejlot

感謝指出out :)我編輯了這些問題 – TobSta

SGD - 經過一些迭代後損失開始增加

回答

相關問題