R中的Manual Perceptron示例 - 結果是否可接受？

我想獲得感知器算法的分類工作，但我認爲缺少一些東西。這是迴歸實現決策邊界：R中的Manual Perceptron示例 - 結果是否可接受？

紅點坐進大學，在測試1進行更完善和2

This is the data，這是邏輯迴歸的代碼在R：

dat = read.csv("perceptron.txt", header=F) 
colnames(dat) = c("test1","test2","y") 
plot(test2 ~ test1, col = as.factor(y), pch = 20, data=dat) 
fit = glm(y ~ test1 + test2, family = "binomial", data = dat) 
coefs = coef(fit) 
(x = c(min(dat[,1])-2, max(dat[,1])+2)) 
(y = c((-1/coefs[3]) * (coefs[2] * x + coefs[1]))) 
lines(x, y)

爲「手動」執行感知器的代碼如下：

# DATA PRE-PROCESSING: 
dat = read.csv("perceptron.txt", header=F) 
dat[,1:2] = apply(dat[,1:2], MARGIN = 2, FUN = function(x) scale(x)) # scaling the data 
data = data.frame(rep(1,nrow(dat)), dat) # introducing the "bias" column 
colnames(data) = c("bias","test1","test2","y") 
data$y[data$y==0] = -1 # Turning 0/1 dependent variable into -1/1. 
data = as.matrix(data) # Turning data.frame into matrix to avoid mmult problems. 

# PERCEPTRON: 
set.seed(62416) 
no.iter = 1000       # Number of loops 
theta = rnorm(ncol(data) - 1)   # Starting a random vector of coefficients. 
theta = theta/sqrt(sum(theta^2))   # Normalizing the vector. 
h = theta %*% t(data[,1:3])    # Performing the first f(theta^T X) 

for (i in 1:no.iter){     # We will recalculate 1,000 times 
    for (j in 1:nrow(data)){    # Each time we go through each example. 
     if(h[j] * data[j, 4] < 0){   # If the hypothesis disagrees with the sign of y, 
     theta = theta + (sign(data[j,4]) * data[j, 1:3]) # We + or - the example from theta. 
     } 
     else 
     theta = theta      # Else we let it be. 
    } 
    h = theta %*% t(data[,1:3])   # Calculating h() after iteration. 
} 
theta         # Final coefficients 
mean(sign(h) == data[,4])    # Accuracy

有了這個，我得到以下係數：

 bias  test1  test2 
9.131054 19.095881 20.736352

和88%的精度，用同glm()迴歸函數計算是一致的：的89%mean(sign(predict(fit))==data[,4]) - 從邏輯上講，不存在線性方式分類所有的點，如上圖所示。事實上，迭代只有10次和繪圖精度，一個~90%只是1迭代後達到：

正在與logistic迴歸的訓練分類性能線，很可能是代碼不是從概念上說錯了。

問題：是否確定獲得係數從邏輯迴歸如此不同：

(Intercept)  test1  test2 
    1.718449 4.012903 3.743903

來源

2016-06-25 Toni

這實在是一個多問題的StackOverflow一個交叉驗證的問題，但我會繼續和答案。

是的，這是正常的，並期望得到非常不同的係數，因爲你不能直接比較這兩種技術之間的係數的大小。

使用logit（邏輯）模型，您將使用基於S型成本函數的二項分佈和logit-link。這些係數只在這方面有意義。你在邏輯中也有一個截取項。

感知器模型沒有一個是這樣的。因此係數的解釋完全不同。

現在，這並不是說哪種模式更好。您的問題中沒有可比較的性能指標，可以讓我們確定這一點。確定你應該進行交叉驗證或至少使用一個堅持樣本。

來源

2016-06-25 00:26:51

我是在這裏還是在簡歷上發帖。謝謝您的回答。那麼，我想知道，如果我的代碼是正確的， 2.我是否可以使用感知器的係數來生成決策邊界線（你知道怎麼做？）;以及3.我的「手動」方法中的一列（偏差）是否不等同於截距（我認爲這裏的答案是「theshold」值）。 – Toni

@Toni我沒有檢查你的感知器過程的邏輯。我可以問你爲什麼要手動做？如果你使用一個軟件包，我們可以確定它是正確的，並且幫助你生成決策邊界圖會更容易。 –

我覺得當我通過編寫一個愚蠢的，有一些玩具數據的簡單例子時，我會更好地理解這個過程。你知道如何繪製新系數的決策邊界嗎？ – Toni

R中的Manual Perceptron示例 - 結果是否可接受？

回答

相關問題