R：如何在測試集上使用預測

因此，我將最終對多組預測變量進行多元迴歸。爲了確保我正確地使用數據並通過玩具模型獲得預期的結果。但是，當我嘗試使用預測時，它並不預測新數據，而且由於新數據的大小與訓練集的大小不同，所以它給我一個錯誤。我在網上查看並嘗試了各種各樣的東西，但都沒有成功。我幾乎準備放棄並編寫自己的函數，但我也在用pls包構建模型，我猜可能已經在內部調用了它，所以我想要保持一致。這裏是短腳本我寫道：R：如何在測試集上使用預測

x1<-c(1.1,3.4,5.6,1.2,5,6.4,0.9,7.2,5.4,3.1) # Orginal Variables 
    x2<-c(10,21,25,15.2,18.9,19,16.2,22.1,18.6,22) 
    y<-2.0*x1+1.12*x2+rnorm(10,mean=0,sd=0.2) # Define output variable 
    X<-data.frame(x1,x2) 
    lfit<-lm(y~.,X) # fit model 
    n_fit<-lfit$coefficients 

    xg1<-runif(15,1,10) # define new data 
    xg2<-runif(15,10,30) 
    X<-data.frame(xg1,xg2)# put into data frame 

    y_guess<-predict(lfit,newdata=X) #Predict based on fit 
    y_actual<-2.0*xg1+1.12*xg2 # actual values because I know the coefficients 
    y_pred=n_fit[1]+n_fit[2]*xg1+n_fit[3]*xg2 # What predict should give me based on fit 
    print(y_guess-y_actual) #difference check 
    print(y_guess-y_pred)

這些是我正在值和錯誤消息：

[1] -4.7171499 -16.9936498 6.9181074 -6.1964788 -11.1852816 0.9257043 -13.7968731 -6.6624086 15.5365141 -8.5009428 
    [11] -22.8866505 2.0804016 -1.8728602 -18.7670797 1.2251849 
    [1] -4.582645 -16.903164 7.038968 -5.878723 -11.149987 1.162815 -13.473351 -6.483111 15.731694 -8.456738 
    [11] -22.732886 2.390507 -1.662446 -18.627342 1.431469 
    Warning messages: 
    1: 'newdata' had 15 rows but variables found have 10 rows 
    2: In y_guess - y_actual : 
    longer object length is not a multiple of shorter object length 
    3: In y_guess - y_pred : 
    longer object length is not a multiple of shorter object length

預測係數是1.97和1.13和截距-0.25，它應該是0但我增加了噪音，這不會造成很大的差異。我如何得到它，所以我可以預測一個獨立的測試集。

感謝

來源

2015-10-05 Joshua Mannheimer

你需要在'data.frame'在用於''newdata'預測相同的名稱（）' ，例如。 'X <-data.frame（X1 = XG1，X2 = XG2）' –

從幫助 - 文檔，?predict.lm：

"Variables are first looked for in newdata and then searched for in the usual way (which will include the environment of the formula used in the fit)."

的data.frame()，在創建：X <- data.frame(xg1, xg2)，有不同的名稱：（XG1，XG2）。 predict()找不到原始名稱（x1，x2），然後將在公式中搜索正確的變量。結果是您可以從原始數據中獲得擬合值。

通過使你們的名字在newdata符合原來的解決這個問題： X <- data.frame(x1=xg1, x2=xg2)：

x1 <- c(1.1, 3.4, 5.6, 1.2, 5, 6.4, 0.9, 7.2, 5.4, 3.1) # Orginal Variables 
x2 <- c(10, 21, 25, 15.2, 18.9, 19, 16.2, 22.1, 18.6, 22) 
y <- 2.0*x1 + 1.12*x2 + rnorm(10, mean=0, sd=0.2) # Define output variable 
X <- data.frame(x1, x2) 
lfit <- lm(y~., X) # fit model 
n_fit <- lfit$coefficients 

xg1 <- runif(15, 1, 10) # define new data 
xg2 <- runif(15, 10, 30) 
X <- data.frame(x1=xg1, x2=xg2) # put into data frame 

y_guess <- predict(lfit, newdata=X) #Predict based on fit 
y_actual <- 2.0*xg1 + 1.12*xg2 # actual values because I know the coefficients 
y_pred = n_fit[1] + n_fit[2]*xg1 + n_fit[3]*xg2 # What predict should give me based on fit 

> print(y_guess - y_actual) #difference check 
      1   2   3   4   5   6   7   8   9   10   11   12   13 
-0.060223916 -0.047790535 -0.018274280 -0.096190467 -0.079490487 -0.063736231 -0.047506981 -0.009523583 -0.047774006 -0.084276807 -0.106322290 -0.030876942 -0.067232989 
      14   15 
-0.023060651 -0.041264431 
> print(y_guess - y_pred) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

來源

2015-10-05 16:25:54

R：如何在測試集上使用預測

回答

相關問題