python的scipy.stats.ranksums與R的wilcox.test

python的scipy.stats.ranksums和R的wilcox.test都應該計算Wilcoxon秩和檢驗的雙側p值。但是，當我運行在相同的數據這兩種功能，我得到的p值，通過數量級的差別：python的scipy.stats.ranksums與R的wilcox.test

R：

> x=c(57.07168,46.95301,31.86423,38.27486,77.89309,76.78879,33.29809,58.61569,18.26473,62.92256,50.46951,19.14473,22.58552,24.14309) 
> y=c(8.319966,2.569211,1.306941,8.450002,1.624244,1.887139,1.376355,2.521150,5.940253,1.458392,3.257468,1.574528,2.338976) 
> print(wilcox.test(x, y)) 

     Wilcoxon rank sum test 

data: x and y 
W = 182, p-value = 9.971e-08 
alternative hypothesis: true location shift is not equal to 0

的Python：

>>> x=[57.07168,46.95301,31.86423,38.27486,77.89309,76.78879,33.29809,58.61569,18.26473,62.92256,50.46951,19.14473,22.58552,24.14309] 
>>> y=[8.319966,2.569211,1.306941,8.450002,1.624244,1.887139,1.376355,2.521150,5.940253,1.458392,3.257468,1.574528,2.338976] 
>>> scipy.stats.ranksums(x, y) 
(4.415880433163923, 1.0059968254463979e-05)

中以r給我1E -7，而Python給了我1e-5。

這種差異來自哪裏，哪一個是「正確」的p值？

來源

2012-10-09 Nils

這取決於的選項的選擇（確切Vs的正態近似，具有或不具有連續性校正）：

的r值：

默認情況下（如果「精確」未指定），如果樣本包含少於50個有限值並且沒有關係，則計算精確的p值。否則，使用正常的近似值。

默認（如上所示）：

wilcox.test(x, y) 

    Wilcoxon rank sum test 

data: x and y 
W = 182, p-value = 9.971e-08 
alternative hypothesis: true location shift is not equal to 0

具有連續性校正普通近似：

> wilcox.test(x, y, exact=FALSE, correct=TRUE) 

    Wilcoxon rank sum test with continuity correction 

data: x and y 
W = 182, p-value = 1.125e-05 
alternative hypothesis: true location shift is not equal to 0

普通近似不連續性校正：

> (w0 <- wilcox.test(x, y, exact=FALSE, correct=FALSE)) 

    Wilcoxon rank sum test 

data: x and y 
W = 182, p-value = 1.006e-05 
alternative hypothesis: true location shift is not equal to 0

有關詳細一點精度：

w0$p.value 
[1] 1.005997e-05

它看起來像另一個值Python是給你（4.415880433163923）是Z分數：

2*pnorm(4.415880433163923,lower.tail=FALSE) 
[1] 1.005997e-05

我能體會想知道這是怎麼回事，但我也想指出， p=1e-7和p=1e-5 ...

來源

2012-10-09 11:45:05

是的，Scipy在這裏返回z分數。 http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html –

我相信在處理關係時也可能存在差異，這可能需要在scipy中專門處理。 – seberg

謝謝你的解釋！你知道是否有辦法迫使Scipy計算一個確切的p值並處理關係？我知道Scipy中有一個替代函數叫做scipy.stats.mannwhitneyu，它處理關係並進行連續性修正，但這仍然不準確，文檔聲明我至少應該有20個樣本。 – Nils

python的scipy.stats.ranksums與R的wilcox.test

回答

相關問題