在ggplot2中使用geom_stat/geom_smooth時，在置信區間內找到點並且在置信區間下找到點

我有散點圖，我想知道如何在置信區間線上下找到基因？在ggplot2中使用geom_stat/geom_smooth時，在置信區間內找到點並且在置信區間下找到點

編輯：重複的例子：

library(ggplot2) 
#dummy data 
df <- mtcars[,c("mpg","cyl")] 

#plot 
ggplot(df,aes(mpg,cyl)) + 
    geom_point() + 
    geom_smooth()

來源

2015-10-12 star

您可以通過您的代碼和數據開始。 – nrussell

'identify（x，y ...）'但需要部分數據 – Mateusz1981

請注意，置信區間線是數據平均值的置信區間，而不是數據本身。而且因爲你有這麼多的數據，我希望大部分值都在間隔之外。 – bramtayl

我只好深吸潛入github回購，但我終於得到它。爲了做到這一點，你需要知道stat_smooth是如何工作的。在這種特定的情況下，loess函數被調用做平滑（不同的平滑功能，可以使用如下相同的過程來構造）：

所以，在這個場合使用loess我們會做：

#data 
df <- mtcars[,c("mpg","cyl"), with=FALSE] 
#run loess model 
cars.lo <- loess(cyl ~ mpg, df)

然後我必須閱讀this才能看到如何在stat_smooth內部進行預測。顯然哈德利使用predictdf功能（這是不出口的命名空間）爲我們的情況如下：

：

predictdf.loess <- function(model, xseq, se, level) { 
    pred <- stats::predict(model, newdata = data.frame(x = xseq), se = se) 

    if (se) { 
    y = pred$fit 
    ci <- pred$se.fit * stats::qt(level/2 + .5, pred$df) 
    ymin = y - ci 
    ymax = y + ci 
    data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit) 
    } else { 
    data.frame(x = xseq, y = as.vector(pred)) 
    } 
}

看完上面我可以使用，以創建自己的預測data.frame後

#get the predictions i.e. the fit and se.fit vectors 
pred <- predict(cars.lo, se=TRUE) 
#create a data.frame from those 
df2 <- data.frame(mpg=df$mpg, fit=pred$fit, se.fit=pred$se.fit * qt(0.95/2 + .5, pred$df))

看着predictdf.loess我們可以看到置信區間的上邊界被創建爲pred$fit + pred$se.fit * qt(0.95/2 + .5, pred$df)，下邊界爲pred$fit - pred$se.fit * qt(0.95/2 + .5, pred$df)。

利用這些，我們可以通過以下這些邊界創建點的標誌：

#make the flag 
outerpoints <- +(df$cyl > df2$fit + df2$se.fit | df$cyl < df2$fit - df2$se.fit) 
#add flag to original data frame 
df$outer <- outerpoints

的df$outer列可能是什麼OP是尋找（它需要的值爲1，如果是外邊界或0），但只是爲了它，我正在繪製下面。

注意上面的+函數僅用於將邏輯標誌轉換爲數字。現在

如果畫出就象這樣：

ggplot(df,aes(mpg,cyl)) + 
    geom_point(aes(colour=factor(outer))) + 
    geom_smooth()

，我們可以清楚地看到內部和置信區間外的點。

輸出：

附：對於任何人誰是感興趣的上限和下限，他們創造了這樣的（猜測：儘管陰影區域可能與geom_ribbon創建 - 或者類似的東西 - 這使他們更全面和漂亮）：

#upper boundary 
ggplot(df,aes(mpg,cyl)) + 
    geom_point(aes(colour=factor(outer))) + 
    geom_smooth() + 
    geom_line(data=df2, aes(mpg , fit + se.fit , group=1), colour='red') 

#lower boundary 
ggplot(df,aes(mpg,cyl)) + 
    geom_point(aes(colour=factor(outer))) + 
    geom_smooth() + 
    geom_line(data=df2, aes(mpg , fit - se.fit , group=1), colour='red')

來源

2015-10-16 15:06:45 LyzandeR

不錯，正準備發佈類似的答案;-) – Jaap

謝謝@Jaap :)。對不起，我知道它是如何從經驗:)。如果您認爲它添加了其他信息，請將其發佈。 – LyzandeR

沒有必要，我沒有什麼可以改進你的回答:-)（除了一些小的編輯） – Jaap

該解決方案充分利用了辛勤工作的GGPLOT2爲你做：

library(sp) 

# we have to build the plot first so ggplot can do the calculations 
ggplot(df,aes(mpg,cyl)) + 
    geom_point() + 
    geom_smooth() -> gg 

# do the calculations 
gb <- ggplot_build(gg) 

# get the CI data 
p <- gb$data[[2]] 

# make a polygon out of it 
poly <- data.frame(
    x=c(p$x[1], p$x, p$x[length(p$x)], rev(p$x)), 
    y=c(p$ymax[1], p$ymin, p$ymax[length(p$x)], rev(p$ymax)) 
) 

# test for original values in said polygon and add that to orig data 
# so we can color by it 
df$in_ci <- point.in.polygon(df$mpg, df$cyl, poly$x, poly$y) 

# re-do the plot with the new data 
ggplot(df,aes(mpg,cyl)) + 
    geom_point(aes(color=factor(in_ci))) + 
    geom_smooth()

它需要一些調整（即最後一點得到一個2值），但我有限的時間。需要注意的是point.in.polygon返回值是：

0：關鍵是嚴格的外部POL
1：一點是嚴格內部POL
2：點位於POL
3：點POL

這樣的頂點應該很容易，只是更改代碼到TRUE/FALSE是否值爲0。

來源

2015-10-16 20:39:49 hrbrmstr

使用ggplot_build像@ hrbrmstr的很好的解決方案，你其實可以簡單地通過x值的序列geom_smooth指定其中的誤差範圍應計算做到這一點，使這相當於你點的x值。然後，你只要看看y值是否在範圍內。

library(ggplot2) 

## dummy data 
df <- mtcars[,c("mpg","cyl")] 

ggplot(df, aes(mpg, cyl)) + 
    geom_smooth(params=list(xseq=df$mpg)) -> gg 

## Find the points within bounds 
bounds <- ggplot_build(gg)[[1]][[1]] 
df$inside <- with(df, bounds$ymin < cyl & bounds$ymax > cyl) 

## Add the points 
gg + geom_point(data=df, aes(color=inside)) + theme_bw()

來源

2015-10-16 21:42:10 jenesaisquoi

在ggplot2中使用geom_stat/geom_smooth時，在置信區間內找到點並且在置信區間下找到點

回答

相關問題