R樣本中的雙樣卡方檢驗

我對R很新，所以請耐心等待。我使用卡方檢驗對核苷酸的頻率在給定的位置進行比較，並且我計數A，C，G，T的在兩個不同的數據集的數目：R樣本中的雙樣卡方檢驗

x1 <- c(272003,310418,201601,237168) 
x2 <- c(239614,316515,182070,198025)

我可以想到兩種辦法要求兩樣本卡方檢驗：

> chisq.test(x1,x2) 

    Pearson's Chi-squared test 

data: x1 and x2 
X-squared = 12, df = 9, p-value = 0.2133 

Warning message: 
In chisq.test(x1, x2) : Chi-squared approximation may be incorrect

或

> chisq.test(cbind(x1,x2)) 

    Pearson's Chi-squared test 

data: cbind(x1, x2) 
X-squared = 2942.065, df = 3, p-value < 2.2e-16

我懷疑是第二個版本是正確的，因爲我也可以這樣做：

> chisq.test(x1,x1) 

    Pearson's Chi-squared test 

data: x1 and x1 
X-squared = 12, df = 9, p-value = 0.2133 

Warning message: 
In chisq.test(x1, x1) : Chi-squared approximation may be incorrect

具有相同且明顯不正確的結果。

在這種情況下實際計算的是什麼？

謝謝！

來源

2014-01-27 cjolley

chisq.test(x1,x1)$expected顯示以下內容：

 x1 
x1  201601 237168 272003 310418 
    201601 0.25 0.25 0.25 0.25 
    237168 0.25 0.25 0.25 0.25 
    272003 0.25 0.25 0.25 0.25 
    310418 0.25 0.25 0.25 0.25

觀測計數（chisq.test(x1,x1)$observed）：

 x1 
x1  201601 237168 272003 310418 
    201601  1  0  0  0 
    237168  0  1  0  0 
    272003  0  0  1  0 
    310418  0  0  0  1

所以這樣它假定您提供所有的對，但你只所以能提供相同的數字是觀察到的數量。然後預期值實際上是「正確的」（儘管在這種情況下很愚蠢）。作爲一個方面說明，chisq.test(cbind(x1,x1))做你希望它做（X-squared = 0, df = 3, p-value = 1）。

你的第二個結果看起來雖好：

> chisq.test(cbind(x1,x2))$observed 
     x1  x2 
[1,] 272003 239614 
[2,] 310418 316515 
[3,] 201601 182070 
[4,] 237168 198025 
> chisq.test(cbind(x1,x2))$expected 
      x1  x2 
[1,] 266912.4 244704.6 
[2,] 327073.2 299859.8 
[3,] 200162.6 183508.4 
[4,] 227041.8 208151.2

來源

2014-01-27 06:15:22 PascalVKooten

R樣本中的雙樣卡方檢驗

回答

相關問題