2013-03-18 54 views
5

我想R中使用隨機梯度下降,以建立自己的迴歸函數,但我現在所擁有的使權重成長過程中沒有約束,因此從來沒有停止:R中迴歸公式的執行

# Logistic regression 
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar 
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) { 
    # Initialize gradient vector 
    gradient <- as.vector(rep(0,NCOL(training_examples))) 
    # Difference between weights 
    del_weights <- as.matrix(1) 
    # Weights 
    weights <- as.matrix(runif(NCOL(training_examples))) 
    weights_old <- as.matrix(rep(0,NCOL(training_examples))) 

    # Compute gradient 
    while(norm(del_weights) > conv_lim) { 

    for (k in 1:NROW(training_examples)) { 
     gradient <- gradient + 1/NROW(training_examples)* 
     ((t(training_outputs[k]*training_examples[k,] 
      /(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,])))))) 
    } 

    # Update weights 
    weights <- weights_old - learn_rate*gradient 
    del_weights <- as.matrix(weights_old - weights) 
    weights_old <- weights 

    print(weights) 
    } 
    return(weights) 
} 

的功能可以用下面的代碼進行測試:

data(iris) # Iris data already present in R  
# Dataset for part a (first 50 vs. last 100) 
iris_a <- iris 
iris_a$Species <- as.integer(iris_a$Species) 
# Convert list to binary class 
for (i in 1:NROW(iris_a$Species)) {if (iris_a$Species[i] != "1") {iris_a$Species[i] <- -1}}  
random_sample <- sample(1:NROW(iris),50) 

weights_a <- my_logr(iris_a[random_sample,1:4],iris_a$Species[random_sample],1,.1) 

我雙重檢查我的針對Abu-Mostafa's算法,其如下:

  1. 初始化權重向量
  2. 對於每個時間段計算梯度:
    gradient <- -1/N * sum_{1 to N} (training_answer_n * training_Vector_n/(1 + exp(training_answer_n * dot(weight,training_vector_n))))
  3. weight_new <- weight - learn_rate*gradient
  4. 重複,直到體重增量足夠小

我失去了一些東西在這裏?

+0

我是否缺少權重的標準化術語?這是一個交叉驗證的問題,也許? – 2013-03-18 13:50:48

+1

從數學的角度來看,權重向量的無約束幅度不會產生獨特的解決方案。 - 權重/規範(權重)' ... '的權重< - weights_old - learn_rate * gradient' '權重 '權重<:當我加入這兩行分類器函數,它在兩個步驟會聚< - 權重/規範(權重)' – 2013-03-18 13:58:24

+0

下面的答案有幫助嗎? – 2013-03-18 16:43:21

回答

2

從數學的角度來看,所述權重向量的無約束大小不產生獨特的解決方案。當我將這兩行的分類功能,它融合在兩個步驟:

# Normalize 
weights <- weights/norm(weights) 

...

# Update weights 
weights <- weights_old - learn_rate*gradient 
weights <- weights/norm(weights) 

我不能讓@ SimonO101的工作,我沒有使用此真正的工作代碼(有嵌入像glm),所以它足以做我理解的循環。 整個功能如下:

# Logistic regression 
# Takes training example vector, output vector, learn rate scalar, and convergence delta limit scalar 
my_logr <- function(training_examples,training_outputs,learn_rate,conv_lim) { 
    # Initialize gradient vector 
    gradient <- as.vector(rep(0,NCOL(training_examples))) 
    # Difference between weights 
    del_weights <- as.matrix(1) 
    # Weights 
    weights <- as.matrix(runif(NCOL(training_examples))) 
    weights_old <- as.matrix(rep(0,NCOL(training_examples))) 

    # Normalize 
    weights <- weights/norm(weights) 

    # Compute gradient 
    while(norm(del_weights) > conv_lim) { 

    for (k in 1:NCOL(training_examples)) { 
     gradient <- gradient - 1/NROW(training_examples)* 
     ((t(training_outputs[k]*training_examples[k,] 
      /(1+exp(training_outputs[k]*t(weights)%*%as.numeric(training_examples[k,])))))) 
    } 
#  gradient <- -1/NROW(training_examples) * sum(training_outputs * training_examples/(1 + exp(training_outputs * weights%*%training_outputs))) 

    # Update weights 
    weights <- weights_old - learn_rate*gradient 
    weights <- weights/norm(weights) 
    del_weights <- as.matrix(weights_old - weights) 
    weights_old <- weights 

    print(weights) 
    } 
    return(weights) 
} 
+0

+1。看起來不錯。我沒有完全理解算法過程。我今天晚上開始談論這件事。 – 2013-03-19 13:30:22

+0

你是什麼意思「權重」?你如何計算標準誤差和p值?你能給這個算法的一些參考嗎?謝謝! – qed 2014-03-30 19:00:15

+1

我也試圖在C++中實現logistic迴歸,但是使用IRLS算法,涉及到矩陣反轉,有時甚至令人頭疼。 – qed 2014-03-30 19:01:52

1

有幾個問題。首先,你可以更好地使用R的矢量化方法。其次,我不是隨機梯度下降的專家,但是您在問題下面給出的算法與您在函數中如何計算梯度不相符。仔細檢查這個代碼,但它似乎收斂,並且我認爲它遵循Abu-Mostfafa's。我收集你想要計算這個梯度;

gradient <- -1/N * sum(training_outputs * training_examples/(1 + exp(training_outputs * dot(weights ,training_outputs)))) 

那麼你的算法應該閱讀這部分...

while(norm(del_weights) > conv_lim) { 
gradient <- -1/NROW(iris_a) * sum(training_outputs * training_examples/(1 + exp(training_outputs * as.matrix(training_examples) %*% weights))) 

# Update weights 
weights <- weights_old - learn_rate*gradient 
del_weights <- as.matrix(weights_old - weights) 
weights_old <- weights 
print(weights) 

}

您可以創建更容易物種變量的二元分類使用:

iris_a$Species <- as.numeric(iris_a$Species) 
iris_a$Species[ iris_a$Species != 1 ] <- -1  

我無法告訴您返回的結果是否合理,但該代碼應該按照步驟2進行。檢查每個步驟仔細,記住R是矢量化的,所以你可以在沒有循環的矢量上做元素明智的操作。例如爲:

x <- 1:5 
y <- 1:5 
x*y 
#[1] 1 4 9 16 25 
+0

嗯,這有很大的幫助(我也掩蓋了其他二進制虹膜設置不正確,這裏沒有顯示),但現在我得到了另一個奇怪的結果,這是每個分類器都訓練相同的方式:(http ://i.imgur.com/qvamVQW.png) – 2013-03-19 00:23:41

+0

當迴歸函數返回時,看起來所有權重都是相同的。你確定這些乘法是正確的嗎? – 2013-03-19 00:32:12

+1

@TrevorAlexander嗨,沒有那些乘法是不正確的,很好的發現。我看到你發佈了一個可行的解決方案,但我想回去梳理矢量化代碼並使其工作,因爲這是一個很好的習慣! – 2013-03-19 13:29:45