我正在嘗試使用隨機梯度下降作爲求解器來實現Python中嶺迴歸的解決方案。我對SGD代碼如下:在Python中使用隨機梯度下降進行嶺迴歸
def fit(self, X, Y):
# Convert to data frame in case X is numpy matrix
X = pd.DataFrame(X)
# Define a function to calculate the error given a weight vector beta and a training example xi, yi
# Prepend a column of 1s to the data for the intercept
X.insert(0, 'intercept', np.array([1.0]*X.shape[0]))
# Find dimensions of train
m, d = X.shape
# Initialize weights to random
beta = self.initializeRandomWeights(d)
beta_prev = None
epochs = 0
prev_error = None
while (beta_prev is None or epochs < self.nb_epochs):
print("## Epoch: " + str(epochs))
indices = range(0, m)
shuffle(indices)
for i in indices: # Pick a training example from a randomly shuffled set
beta_prev = beta
xi = X.iloc[i]
errori = sum(beta*xi) - Y[i] # Error[i] = sum(beta*x) - y = error of ith training example
gradient_vector = xi*errori + self.l*beta_prev
beta = beta_prev - self.alpha*gradient_vector
epochs += 1
我在測試這個數據不是標準化的,我的實現總是與所有的權重爲無限遠,即使我初始化加權向量低值結束。只有當我將學習速率alpha設置爲一個非常小的值〜1e-8時,算法纔會以加權向量的有效值結束。
我的理解是,標準化/縮放輸入功能僅有助於縮短收斂時間。但是如果這些特徵沒有被標準化,那麼該算法不應該整體上不會收斂。我的理解是否正確?