2013-05-25 111 views
3

我試圖將某些模型擬合到某些數據,並且所得到的模型預測了合理的值並且曲線圖看起來是正確的。但是當提取係數並分別繪製函數時,它們是沒有意義的!我顯然做錯了,所以請有人告訴我錯誤在哪裏?R-多項式線性模型係數不適合模型的預測值

數據:

dput(distcur) 
structure(list(id1 = c(1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6, 1.6 
), range = c(-39.898125, -21.448125, -11.07, -3.22875, 3.776484375, 
12.309609375, 22.399453125, 39.235078125), meanrat = c(20.2496, 
17.7504273504274, 12.76875, 2.475, -1.4295652173913, -3.9603305785124, 
-14.7008547008547, -19.7366666666667)), .Names = c("id1", "range", 
"meanrat"), row.names = 9:16, class = "data.frame") 

library(ggplot2) 

id = 1.6 
degree = 3 

press_x <- seq(min(distcur$range), max(distcur$range), length = 500) 
moddist3b <- lm(meanrat ~ poly(range, degree), distcur) 
valsdist = data.frame(predict(moddist3b, data.frame(range = press_x))) 

colnames(valsdist) = "pred" 

valsdist$id1 = id 

allvals = cbind(valsdist, press_x) 

summary(moddist3b) 

#test plot 
pdf(paste("mod-",measure,id)) 
TITLE = paste("Distance ID: ", id, "Model = line, Points = exp1") 

p = ggplot(allvals, aes(x=press_x, y=pred, colour=factor(id1))) + 
      geom_line() + 
geom_point(data=distcur, aes(shape=factor(id1), x = range, y = meanrat, colour = factor(id1))) + 
       ylim(-100, 100) + 
       labs(title=TITLE) + 
       ylab("Mean Rating (%)") + 
       xlab(measure) 


print(p) 
dev.off() 

Plot of model vs points

我知道圖像是非常糟糕的質量,但它表明,它是正確的。但是,從用於構建功能看起來一點也不像情節模型得到的係數:

summary(moddist3b) 

Call: 
lm(formula = meanrat ~ poly(range, degree), data = distcur) 

Residuals: 
     9  10  11  12  13  14  15  16 
-0.20134 0.44939 1.65996 -2.80500 -1.14594 2.98617 -0.92081 -0.02244 

Coefficients: 
        Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.6770  0.8281 2.025 0.1128  
poly(range, degree)1 -37.7155  2.3423 -16.102 8.7e-05 *** 
poly(range, degree)2 -2.9435  2.3423 -1.257 0.2773  
poly(range, degree)3 6.4888  2.3423 2.770 0.0503 . 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.342 on 4 degrees of freedom 
Multiple R-squared: 0.9853, Adjusted R-squared: 0.9743 
F-statistic: 89.51 on 3 and 4 DF, p-value: 0.0004019 

給予函數y = 6.49x^3 -2.94x​​^2 - 37.72x + 1.68

繪製上谷歌清楚地表明,該功能是不一樣來自R的情節(從模型)

https://www.google.com/search?q=6.49x^3+%E2%88%922.94x^2+%E2%88%92+37.72x+%2B+1.68&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=iceweasel-a&channel=fflb#client=iceweasel-a&rls=org.mozilla:en-US%3Aunofficial&channel=fflb&sclient=psy-ab&q=6.49*x^3+-2.94*x^2+-+37.72*x+%2B+1.68&oq=6.49*x^3+-2.94*x^2+-+37.72*x+%2B+1.68&gs_l=serp.3...3610.3975.1.4155.2.2.0.0.0.0.107.147.1j1.2.0...0.0...1c.1.14.psy-ab.4C6De6gdmtg&pbx=1&bav=on.2,or.r_qf.&bvm=bv.47008514,d.d2k&fp=5e81885614cfda4f&biw=1440&bih=667

+0

只是一個猜測,但是你可能想用'I(poly(range,degree))'把你的獨立參數括起來,這樣'公式'就可以按照你想要的方式來解釋。像「+」和「*」這樣的東西在R公式中有不同的含義。 –

+0

@CarlWitthoft添加'I'給出了完全相同的模型,但是預測的值幾乎是一條水平線,這遠離實驗點。係數仍然與我的問題相同。不知道爲什麼它會影響預測,但我仍然沒有繪製線條的功能。 – unixsnob

回答

6

您所遇到的問題無關與ggplot。相反,這是你如何定義你的線性模型。順便說一句,順便我想通了什麼事情是在0至預測:

R> (moddist3b <- lm(meanrat ~ poly(range, 3), distcur)) 

Coefficients: 
(Intercept) poly(range, 3)1 poly(range, 3)2 poly(range, 3)3 
     1.68   -37.72   -2.94    6.49 

R> predict(moddist3b, data.frame(range = 0)) 
    1 
2.733 

,並注意預測是關閉(它應該是1.68)。

無論如何,你需要適合您使用的模型參數raw=TRUE

(moddist3b <- lm(meanrat ~ poly(range, 3, raw=TRUE), distcur)) 
predict(moddist3b, data.frame(range = 0)) 

這給你你所期望的。默認情況下,poly適用於正交多項式。有關更多詳細信息,請參閱this blog postpoly幫助頁面。

+0

謝謝,我會盡力回覆你。我不認爲這是ggplot2,而是與我創建模型的東西。我發現很難找到涵蓋任何基礎知識的好資源。從來沒有遇到過'生'。將調查你提到的博客。乾杯 – unixsnob