2013-01-07 29 views
2

我有以下分配:如何計算R中曲線下面積的95%可信極限?

x<-c(22.5,28.14285714,33.78571429,39.42857143,45.07142857,50.71428571,56.35714286,62,67.64285714,73.28571429,78.92857143,84.57142857,90.21428571,95.85714286,101.5,107.1428571,112.7857143,118.4285714,124.0714286,129.7142857,135.3571429,141,146.6428571,152.2857143,157.9285714,163.5714286,169.2142857,174.8571429,180.5,186.1428571,191.7857143,197.4285714,203.0714286,208.7142857,214.3571429,220,225.6428571,231.2857143,236.9285714,242.5714286,248.2142857,253.8571429,259.5,265.1428571,270.7857143,276.4285714,282.0714286,287.7142857,293.3571429,299) 
y<-c(0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.00328839614285714,0.00296425985714286,0.002655899,0.00236187857142857,0.002080895,0.00181184271428571,0.00155376085714286,0.00130578928571429,0.001074706,0.000877193,0.000709397142857142,0.000567189714285714,0.000447254,0.000346858571428571,0.000263689142857143,0.000195768428571429,0.000141427,9.92657142857141e-05,6.77857142857142e-05,4.48571428571428e-05,2.86428571428571e-05,1.75142857142857e-05,1.01357142857143e-05,5.52e-06,2.78857142857142e-06,1.27285714285713e-06,5.00714285714284e-07,1.5742857142857e-07,3.29857142857142e-08,2.78857142857137e-09,1.74e-12) 

plot(x,y) 

我想找到的x,分配到左和麪積的0.05到右側下分離的0.95的區域中的值(單尾95%信譽間隔)。

我想我必須將我的經驗曲線擬合到一個函數中,然後整合函數以便我可以獲得所需的值,但是我不知道從哪裏開始。

這怎麼可能在R中完成?

+0

GSee的答案是要走的路。但我想指出,源數據的數值積分不僅比創建擬合函數和積分更容易,而且計算誤差也更小(一般而言)。 –

+2

@CarlWitthoft,我不太確定我的答案(這是'quantile(x,0.95)')。它將'x'分成95%和5%,但根本不考慮區域('y')。 – GSee

+0

@Gsee - 我想你應該生成辛普森積分的分位數。我仍然不會產生合適的功能,壽。 –

回答

2

這是一個積分問題(曲線下的和)。你可以將你的積分分成一個方格和一個曲線。 但是,您可以通過花鍵使用快速和骯髒的近似:

y<-c(0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.003541755,0.00328839614285714,0.00296425985714286,0.002655899,0.00236187857142857,0.002080895,0.00181184271428571,0.00155376085714286,0.00130578928571429,0.001074706,0.000877193,0.000709397142857142,0.000567189714285714,0.000447254,0.000346858571428571,0.000263689142857143,0.000195768428571429,0.000141427,9.92657142857141e-05,6.77857142857142e-05,4.48571428571428e-05,2.86428571428571e-05,1.75142857142857e-05,1.01357142857143e-05,5.52e-06,2.78857142857142e-06,1.27285714285713e-06,5.00714285714284e-07,1.5742857142857e-07,3.29857142857142e-08,2.78857142857137e-09,1.74e-12) 
x<-c(22.5,28.14285714,33.78571429,39.42857143,45.07142857,50.71428571,56.35714286,62,67.64285714,73.28571429,78.92857143,84.57142857,90.21428571,95.85714286,101.5,107.1428571,112.7857143,118.4285714,124.0714286,129.7142857,135.3571429,141,146.6428571,152.2857143,157.9285714,163.5714286,169.2142857,174.8571429,180.5,186.1428571,191.7857143,197.4285714,203.0714286,208.7142857,214.3571429,220,225.6428571,231.2857143,236.9285714,242.5714286,248.2142857,253.8571429,259.5,265.1428571,270.7857143,276.4285714,282.0714286,287.7142857,293.3571429,299) 

sp=smooth.spline(x,y) 
f = function(t) 
{ 
    predict(sp,t)$y 
} 

N=500 # this is an accuracy parameter 
xBis=seq(x[1],x[length(x)],length=N) 
yBis=sapply(x,f) 

J = function (input) 
{ # This function takes input in 1:N 
    Integral = 0 
    dx=(x[length(x)]-x[1])/N 

    for (j in 1: input) 
{ z=xBis[j] 
    Integral=Integral+ f(x[1]+z)*dx 
} 
J=Integral 
} 
###### 
I=J(N) # This is the value of the sum under the curve 
# It should be roughly equal (given the shape of the curve) to: 
index=max(which(y==y[1])) 
I = (x[index]-x[1])*(y[index])*3/2 
###### 
res=sapply(1:N,J)/I 
Index5=max(which(res<=.05)) 
Index95=min(which(res>=.95)) 

x5=xBis[Index5] # This is the 5% quantile 
x95=xBis[Index95] 

HTH

讓我知道如果有什麼不清楚

PS我認爲有更好的方法來做到這一點..

4

正如其他答案指出的那樣,這是一個在曲線問題下的積分,與確定面積達到總面積的95%的地方配合。我採用比David's answer更簡單的方法進行整合。我只是使用梯形積分規則來獲得每個區間所貢獻的區域,而不是插入曲線並將其整合。然後將這些單獨的區域添加到總面積中。然後找到累積面積超過總面積的95%的索引,並且可以繪製一條線。

piece_area <- c(0, (x[-1] - x[-length(x)])*(y[-1] + y[-length(y)])/2) 
cum_area <- cumsum(piece_area) 
total_area <- cum_area[length(cum_area)] 
idx095 <- min(which(cum_area > 0.95 * total_area)) 

abline(v = x[idx095]) 

enter image description here

其中95%被雜交可以通過分佈的原始樣品中使用更多的點獲得的準確點的更高的分辨率。