2016-03-08 18 views
1

這是我在此處詢問的問題之後的後續問題:Is there an equivalent function for anova.lm() in Java?。在這些答案的幫助下,我可以在Java中獲得與R中相同的結果,並在截獲兩個線性模型的Anova。但是,當我從線性模型中刪除截距時,殘差平方和相同,但Java和R中的p值不同。當使用無攔截時Java中不同於Java的線性模型的Anova

當截距被移除時,FD分佈是否應該被計算出來?

R代碼裏面

test_trait <- c(-0.48812477 , 0.33458213, -0.52754476, -0.79863471, -0.68544309, -0.12970239, 0.02355622, -0.31890850,0.34725819 , 0.08108851) 
geno_A <- c(1, 0, 1, 2, 0, 0, 1, 0, 1, 0) 
geno_B <- c(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 

fit <- lm(test_trait ~ geno_A+geno_B) 
fit2 <- lm(test_trait ~ geno_A + geno_B + geno_A:geno_B) 
anova(fit, fit2) 
# Res.Df  RSS Df Sum of Sq  F Pr(>F) 
# 1  7 0.77982       
# 2  6 0.77053 1 0.0092897 0.0723 0.797 

fit <- lm(test_trait ~ geno_A+geno_B -1) 
fit2 <- lm(test_trait ~ geno_A + geno_B + geno_A:geno_B-1) 
anova(fit, fit2) 
# Res.Df  RSS Df Sum of Sq  F Pr(>F) 
# 1  8 0.78539       
# 2  7 0.77080 1 0.014593 0.1325 0.7266 

的Java

double[] y = {-0.48812477, 0.33458213, -0.52754476, -0.79863471, -0.68544309, -0.12970239, 0.02355622, -0.31890850, 0.34725819, 0.08108851}; 
double[][] x = {{1,0}, {0,0}, {1,0}, {2,1}, {0,1}, {0,0}, {1,0}, {0,0}, {1,0}, {0,0}}; 
double[][] xb = {{1,0,0}, {0,0,0}, {1,0,0}, {2,1,2}, {0,1,0}, {0,0,0}, {1,0,0}, {0,0,0}, {1,0,0}, {0,0,0}}; 
OLSMultipleLinearRegression regr = new OLSMultipleLinearRegression(); 
regr.newSampleData(y, x); 
double sumOfSquaresModelA = regr.calculateResidualSumOfSquares(); 
regr.newSampleData(y, xb); 
double sumOfSquaresModelB = regr.calculateResidualSumOfSquares(); 
int degreesOfFreedomA = y.length - (x[0].length + 1); 
int degreesOfFreedomB = y.length - (xb[0].length + 1); 
double MSE = sumOfSquaresModelB/degreesOfFreedomB; 
System.out.printf("RSS intercept: %f\n",sumOfSquaresModelB); 
int degreesOfFreedomDifference = Math.abs(degreesOfFreedomB - degreesOfFreedomA); 
double MSEdiff = Math.abs((sumOfSquaresModelB - sumOfSquaresModelA)/(degreesOfFreedomDifference)); 
double Fval = MSEdiff/MSE; 
FDistribution Fdist = new FDistribution(degreesOfFreedomDifference, degreesOfFreedomB); 
double pval = 1 - Fdist.cumulative(Fval); 
System.out.printf("pval with intercept: %f\n",pval); 
regr.setNoIntercept(true); 
regr.newSampleData(y, x); 
double sumOfSquaresNoInterceptA = regr.calculateResidualSumOfSquares(); 
regr.newSampleData(y, xb); 
double sumOfSquaresNoInterceptB = regr.calculateResidualSumOfSquares(); 
MSE = sumOfSquaresNoInterceptB/degreesOfFreedomB; 
System.out.printf("RSS no intercept: %f\n",sumOfSquaresNoInterceptB); 
degreesOfFreedomDifference = Math.abs(degreesOfFreedomB - degreesOfFreedomA); 
MSEdiff = Math.abs((sumOfSquaresNoInterceptB - sumOfSquaresNoInterceptA)/(degreesOfFreedomDifference)); 
Fval = MSEdiff/MSE; 
Fdist = new FDistribution(degreesOfFreedomDifference, degreesOfFreedomB); 
pval = 1 - Fdist.cumulative(Fval); 
System.out.printf("pval without intercept: %f",pval); 

結果

RSS intercept: 0.770528    //correct 
pval with intercept: 0.796973  //correct 
RSS no intercept: 0.770799   //correct 
pval without intercept: 0.747564  //wrong 

回答

1

卸下攔截增加了自由,我沒有在Java代碼中包含一個額外的自由度。下面確實給了相同的結果

double[] y = {-0.48812477, 0.33458213, -0.52754476, -0.79863471, -0.68544309, -0.12970239, 0.02355622, -0.31890850, 0.34725819, 0.08108851}; 
double[][] x = {{1,0}, {0,0}, {1,0}, {2,1}, {0,1}, {0,0}, {1,0}, {0,0}, {1,0}, {0,0}}; 
double[][] xb = {{1,0,0}, {0,0,0}, {1,0,0}, {2,1,2}, {0,1,0}, {0,0,0}, {1,0,0}, {0,0,0}, {1,0,0}, {0,0,0}}; 
OLSMultipleLinearRegression regr = new OLSMultipleLinearRegression(); 
int degreesOfFreedomA = y.length - (x[0].length); // no + 1 
int degreesOfFreedomB = y.length - (xb[0].length); // no + 1 
regr.setNoIntercept(true); 
regr.newSampleData(y, x); 
double sumOfSquaresNoInterceptA = regr.calculateResidualSumOfSquares(); 
regr.newSampleData(y, xb); 
double sumOfSquaresNoInterceptB = regr.calculateResidualSumOfSquares(); 
double MSE = sumOfSquaresNoInterceptB/degreesOfFreedomB; 
System.out.printf("RSS no intercept: %f\n",sumOfSquaresNoInterceptB); 
int degreesOfFreedomDifference = Math.abs(degreesOfFreedomB - degreesOfFreedomA); 
double MSEdiff = Math.abs((sumOfSquaresNoInterceptB - sumOfSquaresNoInterceptA)/(degreesOfFreedomDifference)); 
double Fval = MSEdiff/MSE; 
FDistribution Fdist = new FDistribution(degreesOfFreedomDifference, degreesOfFreedomB); 
double pval = 1 - Fdist.cumulative(Fval); 
System.out.printf("pval without intercept: %f",pval); 

結果

pval without intercept: 0.726572