當數值具有不同的抽樣概率時，計算中位數的最快方法是什麼？

一位老師想要計算他班上學生的中位身高。但並不是所有的學生每天都會上課，所以在任何一天，計算出的中間身高可能不同。下表列出了他們在課堂上的可能性及其高度。有了這些信息，他可以估計預期的中位數。

>set.seed(123) 
>data1 <- data.frame(Student=c(LETTERS[1:10]), Height.cm=sort(rnorm(n=10, mean=140, sd=10)), Prob.in.class=c(1,.75,1,.5,1,1,1,.25,1,.5)) 

>data1 

    Student Height.cm Prob.in.class 
1  A 127.3494   1.00 
2  B 133.1315   0.75 
3  C 134.3952   1.00 
4  D 135.5434   0.50 
5  E 137.6982   1.00 
6  F 140.7051   1.00 
7  G 141.2929   1.00 
8  H 144.6092   0.25 
9  I 155.5871   1.00 
10  J 157.1506   0.50

在R中估計這種分佈的中位數（或任意分位數）的最快方法是什麼？

對於我的實際計算，我需要估計數百個具有數萬個點（及相關概率）的不同向量的中位數和任意分位數。我已經看到這個概率密度函數是用梯形方法估計的，但我不確定這是最好的方法。

任何意見，你可以提供將不勝感激。謝謝！

來源

2017-04-17 Ricola

沒有，我覺得這（加權位數）要小心的權重向量是罰款。谷歌搜索「加權分位數r」，https://artax.karlin.mff.cuni.cz/r-help/library/reldist/html/wtd.quantile.html或http：//artax.karlin.mff.cuni .cz/r -help/library/PSCBS/html/weightedQuantile.html或https://github.com/hadley/bigvis/blob/master/R/weighted-stats.r？你可以基準一些這些解決方案... –

感謝您指着我在正確的方向@BenBolker。它看起來像'PSCBS :: weightedQuantile'和'reldis :: wtd.quantile'只是使用'Hmisc :: wtd.quantile'，所以我會堅持原來的。 – Ricola

像這樣的事情應該工作，但如下圖所示

#your data 
set.seed(123) 
data1 <- data.frame(Student=c(LETTERS[1:10]), Height.cm=sort(rnorm(n=10, mean=140, sd=10)), Prob.in.class=c(1,.75,1,.5,1,1,1,.25,1,.5)) 

#Test a known ... 
data2 <- c(1,1,1,1,1,2,3,3,3,3,3) # median clearly 2 
median(data2) #yields 2, yah... 

#using weights... median should be 2 if function working right 
data3 <- data.frame(Student=c(LETTERS[1:3]), Height.cm=c(1,2,3), Prob.in.class=c(5/12,2/12,5/12)) 
reldist::wtd.quantile(data3$Height.cm, q = .5, 
        weight = data3$Prob.in.class) # yields 3, not the right answer 

#the wtd.quantile function does not like probabilities. 
#multiply the weights to something greater than 1 seems to work. 
reldist::wtd.quantile(data3$Height.cm, q = .5, weight = data3$Prob.in.class*100) # yields 2, the right answer

來源

2017-04-17 20:54:23 Kgrey

當數值具有不同的抽樣概率時，計算中位數的最快方法是什麼？

回答

相關問題