轉換後的二次變量和因子變量之間的多重線性迴歸互動

-1

我有一個多元線性迴歸，如下所示，包含互動條款，其中一些條款是因子變量（季節，月份，假日，工作日，weathersit）轉換後的二次變量和因子變量之間的多重線性迴歸互動

regwithint=lm(casual~season:temp+season:month+year:temp+ 
      month:temp+holiday:temp+weekday:hum+season+ 
      month+holiday+weekday+weathersit+temp+windspeed 
      ,data=training)

然而，變量臨時和風速轉化到（臨時^ 3）和（風速^ 2）。在交互項

尋找，我有溫度之間的相互作用：平日 其中溫度是臨時^ 3和工作日是因子變量。

我知道大多數情況下，我應該使用我（溫度^ 3），但確實它與因子變量配對的意思是我應該使用聚（溫度，3，原料= T）事實代替？

謝謝。

來源

2016-10-12 ieaggie

這個問題似乎很奇怪熟悉。 –

首先，讓我們建立I()正常工作與因子變量的相互作用：

data(iris) 
reg <- lm(Sepal.Length~Species:I(Petal.Length^2), data=iris) 
summary(reg)

Call: 
lm(formula = Sepal.Length ~ Species:I(Petal.Length^2), data = iris) 

Residuals: 
    Min  1Q Median  3Q  Max 
-0.87875 -0.22363 -0.00197 0.21664 1.06243 

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)       4.245539 0.133172 31.880 < 2e-16 *** 
Speciessetosa:I(Petal.Length^2)  0.341688 0.062196 5.494 1.7e-07 *** 
Speciesversicolor:I(Petal.Length^2) 0.092381 0.007413 12.462 < 2e-16 *** 
Speciesvirginica:I(Petal.Length^2) 0.075714 0.004388 17.253 < 2e-16 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.3431 on 146 degrees of freedom 
Multiple R-squared: 0.8318, Adjusted R-squared: 0.8284 
F-statistic: 240.7 on 3 and 146 DF, p-value: < 2.2e-16

現在，讓我們看看你的函數也工作：

data(iris) 
reg <- lm(Sepal.Length~Species: poly(Petal.Length,2,raw=T), data=iris) 
summary(reg)

它確實（注意它的不同之處在於它必須具有較低階的術語）：

Call: 
lm(formula = Sepal.Length ~ Species:poly(Petal.Length, 2, raw = T), 
    data = iris) 

Residuals: 
    Min  1Q Median  3Q  Max 
-0.73849 -0.22814 -0.01978 0.24177 0.98833 

Coefficients: 
                Estimate Std. Error t value Pr(>|t|) 
(Intercept)          1.79002 1.58957 1.126 0.2620 
Speciessetosa:poly(Petal.Length, 2, raw = T)1  3.87221 2.16771 1.786 0.0762 . 
Speciesversicolor:poly(Petal.Length, 2, raw = T)1 1.13016 0.78109 1.447 0.1501 
Speciesvirginica:poly(Petal.Length, 2, raw = T)1 0.74216 0.56640 1.310 0.1922 
Speciessetosa:poly(Petal.Length, 2, raw = T)2  -1.12847 0.74087 -1.523 0.1299 
Speciesversicolor:poly(Petal.Length, 2, raw = T)2 -0.03641 0.09628 -0.378 0.7059 
Speciesvirginica:poly(Petal.Length, 2, raw = T)2 0.02178 0.05107 0.426 0.6705 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.3367 on 143 degrees of freedom 
Multiple R-squared: 0.8413, Adjusted R-squared: 0.8346 
F-statistic: 126.4 on 6 and 143 DF, p-value: < 2.2e-16

所以，有什麼區別呢？

嗯，就像我在您的其他問題基本上說只是I()就是絕大多數[R程序員在lm和glm方程使用，因爲它更靈活 - 可以用於公式中的任何改造。

但每一個他自己的。 SO禁止基於觀點的問題，所以我要解釋的問題是「兩者都做的工作？」這個問題的答案是「是」，「爲什麼我（）普遍使用的？」答案是「對任何轉型都很靈活」。至於你是否應該使用它，這不是一個問題，我們可以合理地要求或SO回答，但你可能會問它的程序員堆棧交易所（或不管他們叫它這些天）或代碼評論堆棧交換。

來源

2016-10-12 23:32:09

轉換後的二次變量和因子變量之間的多重線性迴歸互動

回答

相關問題