2016-10-11 62 views
1

我必須使用python numpy庫實現隨機梯度下降。爲了這個目的,我給下面的函數定義:Python的numpy隨機梯度下降實現

def compute_stoch_gradient(y, tx, w): 
    """Compute a stochastic gradient for batch data.""" 

def stochastic_gradient_descent(
     y, tx, initial_w, batch_size, max_epochs, gamma): 
    """Stochastic gradient descent algorithm.""" 

我也給予以下幫助功能:

def batch_iter(y, tx, batch_size, num_batches=1, shuffle=True): 
    """ 
    Generate a minibatch iterator for a dataset. 
    Takes as input two iterables (here the output desired values 'y' and the input data 'tx') 
    Outputs an iterator which gives mini-batches of `batch_size` matching elements from `y` and `tx`. 
    Data can be randomly shuffled to avoid ordering in the original data messing with the randomness of the minibatches. 
    Example of use : 
    for minibatch_y, minibatch_tx in batch_iter(y, tx, 32): 
     <DO-SOMETHING> 
    """ 
    data_size = len(y) 

    if shuffle: 
     shuffle_indices = np.random.permutation(np.arange(data_size)) 
     shuffled_y = y[shuffle_indices] 
     shuffled_tx = tx[shuffle_indices] 
    else: 
     shuffled_y = y 
     shuffled_tx = tx 
    for batch_num in range(num_batches): 
     start_index = batch_num * batch_size 
     end_index = min((batch_num + 1) * batch_size, data_size) 
     if start_index != end_index: 
      yield shuffled_y[start_index:end_index], shuffled_tx[start_index:end_index] 

我實現了以下兩個功能:

def compute_stoch_gradient(y, tx, w): 
    """Compute a stochastic gradient for batch data.""" 
    e = y - tx.dot(w) 
    return (-1/y.shape[0])*tx.transpose().dot(e) 


def stochastic_gradient_descent(y, tx, initial_w, batch_size, max_epochs, gamma): 
    """Stochastic gradient descent algorithm.""" 
    ws = [initial_w] 
    losses = [] 
    w = initial_w 
    for n_iter in range(max_epochs): 
     for minibatch_y,minibatch_x in batch_iter(y,tx,batch_size): 
      w = ws[n_iter] - gamma * compute_stoch_gradient(minibatch_y,minibatch_x,ws[n_iter]) 
      ws.append(np.copy(w)) 
      loss = y - tx.dot(w) 
      losses.append(loss) 

    return losses, ws 

我不確定迭代應該在範圍內(max_epochs)還是在更大範圍內完成。我這樣說是因爲我讀到一個時代是「每次我們貫穿整個數據集」。所以我認爲一個時代包含更多的迭代......

+0

對於第二個問題:讀了* *批**,**小批**和**時代**關於sgd。 – sascha

+1

您在內部循環中調用'batch_iter',每次調用時都會實例化一個新的生成器對象。相反,你想要在循環外實例化一個單獨的生成器,然後迭代它, '對於minibatch_y,minibatch_x在batch_iter(...)'中。 –

回答

2

在一個典型的實現中,批量大小爲B的小批量梯度下降應隨機從數據集中選擇B個數據點,並根據計算結果更新權重此子集上的漸變。這個過程本身會持續很多次,直到收斂或某個閾值最大迭代。 B = 1的小批量是SGD,有時可能會有噪音。

隨着上述評論,你可能想要玩批量大小和學習率(步長),因爲它們對隨機和小批量梯度下降的收斂速度具有顯着影響。

下圖顯示上的SGD收斂速度這兩個參數的影響與logistic regression而在亞馬遜的產品評論數據集,分配中出現的機器學習的一個coursera當然做情感分析 - 華盛頓大學分類:

enter image description here enter image description here

有關更詳細的信息可參考https://sandipanweb.wordpress.com/2017/03/31/online-learning-sentiment-analysis-with-logistic-regression-via-stochastic-gradient-ascent/?frame-nonce=987e584e16