使用Base R，我想知道是否可以確定下面表示爲posterior的曲線下的95％面積？我們可以使用Base R來查找曲線下95％的面積嗎？

更具體地說，我想從mode（綠色虛線）移向尾部，然後在覆蓋95％的曲線區域時停止。所需的是這個95％區域的極限值，如下圖所示？

 prior = function(x) dbeta(x, 15.566, 7.051) 
likelihood = function(x) dbinom(55, 100, x) 
posterior = function(x) prior(x)*likelihood(x) 

mode = optimize(posterior, interval = c(0, 1), maximum = TRUE, tol = 1e-12)[[1]] 

curve(posterior, n = 1e4)

P.S換句話說，它是高度可取的，如果這樣一個時間間隔是最短的95％的時間間隔可能的。

來源

2017-08-15 rnorouzian

對稱分佈

儘管OP的例子是不完全對稱，這是足夠接近 - 並從那裏開始有用的，因爲該解決方案是簡單得多。可以使用integrate和optimize的組合。我將其作爲自定義函數編寫，但請注意，如果在其他情況下使用此函數，則可能需要重新考慮搜索分位數的界限。

# For a distribution with a single peak, find the symmetric! 
# interval that contains probs probability. Search over 'range'. 
f_quan <- function(fun, probs, range=c(0,1)){ 

    mode <- optimize(fun, interval = range, maximum = TRUE, tol = 1e-12)[[1]] 

    total_area <- integrate(fun, range[1], range[2])[[1]] 

    O <- function(d){ 
    parea <- integrate(fun, mode-d, mode+d)[[1]]/total_area 
    (probs - parea)^2 
    } 
    # Bounds for searching may need some adjustment depending on the problem! 
    o <- optimize(O, c(0,range[2]/2 - 1E-02))[[1]] 

return(c(mode-o, mode+o)) 
}

使用它像這樣，

f <- f_quan(posterior, 0.95) 
curve(posterior, n = 1e4) 
abline(v=f, col="blue", lwd=2, lty=3)

給

不對稱分佈

在非對稱分佈的情況下，我們必須尋找兩點符合標準P（a < x < b）= Prob，其中Prob是一些期望的概率。由於存在無限多的間隔（a，b），OP建議找到最短的一個。

解決方案中重要的是我們要搜索的區域domain的定義（我們不能使用-Inf, Inf，因此用戶必須將其設置爲合理的值）。

# consider interval (a,b) on the x-axis 
# integrate our function, normalize to total area, to 
# get the total probability in the interval 
prob_ab <- function(fun, a, b, domain){ 
    totarea <- integrate(fun, domain[1], domain[2])[[1]] 
    integrate(fun, a, b)[[1]]/totarea 
} 

# now given a and the probability, invert to find b 
invert_prob_ab <- function(fun, a, prob, domain){ 

    O <- function(b, fun, a, prob){ 
    (prob_ab(fun, a, b, domain=domain) - prob)^2 
    } 

    b <- optimize(O, c(a, domain[2]), a = a, fun=fun, prob=prob)$minimum 

return(b) 
} 

# now find the shortest interval by varying a 
# Simplification: don't search past the mode, otherwise getting close 
# to the right-hand side of domain will give serious trouble! 
prob_int_shortest <- function(fun, prob, domain){ 

    mode <- optimize(fun, interval = domain, maximum = TRUE, tol = 1e-12)[[1]] 

    # objective function to be minimized: the width of the interval 
    O <- function(a, fun, prob, domain){ 
    b <- invert_prob_ab(fun, a, prob, domain) 

    b - a 
    } 

    # shortest interval that meets criterium 
    abest <- optimize(O, c(0,mode), fun=fun, prob=prob, domain=domain)$minimum 

    # now return the interval 
    b <- invert_prob_ab(fun, abest, prob, domain) 

return(c(abest,b)) 
}

現在使用上面的代碼。我使用非常不對稱的函數（假設mydist實際上是一些複雜的pdf，而不是dgamma）。

mydist <- function(x)dgamma(x, shape=2) 
curve(mydist(x), from=0, to=10) 
abline(v=prob_int_shortest(mydist, 0.9, c(0,10)), lty=3, col="blue", lwd=2)

在這個例子中我設置域至（0,10），由於顯然的間隔必須在某處。請注意，使用非常大的值（例如（0，1E05））不起作用，因爲integrate在接近零的長序列時遇到問題。再次，對於你的情況，你將不得不調整域名（除非有人有更好的主意！）。

來源

2017-08-15 23:45:38

界限是問題：如果你搜索整個域（在你的案例中爲0-1），我們會遇到問題，因爲函數沒有定義在0或1（但它在附近）。在函數d中，距離模式的距離是變化的，以便找到（模式-d）到（模式+ d）的積分等於請求的概率（在你的情況下爲0.95）的d。因此，這隻適用於對稱函數，否則你必須優化兩個參數。 –

我認爲如果它是不對稱的，這個問題不會有單一的解決方案！你可以找到許多可以整合到一定概率的pdf的間隔。或者，你實際上是在尋找2.5％和97.％的分位數（這些分位數會整合到95％之間）？如果是這樣，那可以做到。 –

這是可以做到的 - 但請注意，與您提出的原始問題完全不同！我毫不猶豫地編輯我的帖子，因爲這本身就是有用的。我可能會添加另一個答案。 –

這裏是一個解決方案利用所述Trapezoidal rule的。您會注意到@Remko提供的解決方案遠遠優越，但是該解決方案有希望增加一些教學價值，因爲它可以將複雜問題簡化爲幾何，算術和基本編程結構，如for loops。

findXVals <- function(lim, p) { 
    ## (1/p) is the precision 

    ## area of a trapezoid 
    trapez <- function(h1, h2, w) {(h1 + h2) * w/2} 

    yVals <- posterior((1:(p - 1))/p) 
    m <- which.max(yVals) 
    nZ <- which(yVals > 1/p) 

    b <- m + 1 
    e <- m - 1 
    a <- f <- m 

    area <- 0 
    myRng <- 1:(length(nZ)-1) 
    totArea <- sum(trapez(yVals[nZ[myRng]], yVals[nZ[myRng+1]], 1/p)) 
    targetArea <- totArea * lim 

    while (area < targetArea) { 
     area <- area + trapez(yVals[a], yVals[b], 1/p) + trapez(yVals[e], yVals[f], 1/p) 
     a <- b 
     b <- b + 1 
     f <- e 
     e <- e - 1 
    } 

    c((a - 1)/p, (f + 1)/p) 
} 

findXVals(.95, 10^5) 
[1] 0.66375 0.48975

來源

2017-08-16 01:46:51

我們可以使用Base R來查找曲線下95％的面積嗎？

回答

對稱分佈

不對稱分佈

相關問題