預測間隔和數據之間的距離（stat_smooth）

我第一次在R中使用stat_smooth()，我想知道是否有一種方法可以獲得，對於每個x，data(y)和預測間隔之間的距離，你可以看到這裏的圖片：預測間隔和數據之間的距離（stat_smooth）

謝謝您的寶貴幫助！

2015-09-25 Ezay

你可以分享你的數據和你用來使用'dput（data）'產生圖的代碼嗎？ –

如何處理區間內的數據點？對於這些情況：'距離== 0'？ –

請閱讀關於[如何提出一個好問題]（http://stackoverflow.com/help/how-to-ask）以及如何生成[最小可重現示例]的信息（http://stackoverflow.com/問題/ 5963269 /如何對做 - 一個偉大-R-重複性，例如/ 5963610＃5963610）。這會讓其他人更容易幫助你。 – Jaap

正如上面評論中指出的那樣，澄清你的目標會有所幫助。

如果你想複製，ggplot2做什麼，並找到間隔以外的點距離，我有一些代碼給你。

首先，我創建了一些樣本數據，並繪製它：

library(ggplot2) 
# sample data 
set.seed(1234) 
x <- c(1:100) 
y <- c(1:100) + rnorm(100, sd = 5) 
df <- data.frame(x, y) 

ggplot(df, aes(x, y)) + geom_point(alpha = .4) + stat_smooth(span = .3)

然後我複製什麼ggplot2呢：我建立一個黃土模型（ggplot2選擇黃土如果n < 1000），我隨後使用以相同的方式建立置信區間stat_smooth。注意：模型的參數需要與您在stat_smooth中使用的參數相匹配。

# find model, matching the span parameter from the graph above 
model <- loess(y ~ x, data = df, span = 0.3) 

# find x sequence 
xseq <- sort(unique(df$x)) 

# function adapted from ggplot2::predictdf.loess: 
# https://github.com/hadley/ggplot2/blob/f3b519aa90907f13f5d649ff6a512fd539f18b2b/R/stat-smooth-methods.r#L45 
predict_loess <- function(model, xseq, level = 0.95) { 
    pred <- stats::predict(model, newdata = data.frame(x = xseq), se = TRUE) 

    y_pred = pred$fit 
    ci <- pred$se.fit * stats::qt(level/2 + .5, pred$df) 
    ymin = y_pred - ci 
    ymax = y_pred + ci 

    data.frame(x = xseq, y_pred, ymin, ymax, se = pred$se.fit) 
} 

# predict your data 
predicted_data <- predict_loess(model, xseq, level = 0.95) 

# merge predicted data with original y 
merged_data <- with(df, cbind(predicted_data, y)) 

head(merged_data) 
# x  y_pred  ymin  ymax  se   y 
# 1 1 -0.5929504 -5.8628535 4.676953 2.652067 -5.035329 
# 2 2 0.2828659 -4.1520646 4.717796 2.231869 3.387146 
# 3 3 1.1796057 -2.5623056 4.921517 1.883109 8.422206 
# 4 4 2.1074914 -1.0994171 5.314400 1.613870 -7.728489 
# 5 5 3.0696584 0.2371895 5.902127 1.425434 7.145623 
# 6 6 4.0568034 1.4454944 6.668113 1.314136 8.530279

從複製的數據中，我們現在可以找到距離。對於區間內的情況，它返回0。

distances <- with(merged_data, ifelse(y < ymin, ymin - y, 
             ifelse(y > ymax, y - ymax, 0))) 
head(distances) 
# [1] 0.000000 0.000000 3.500689 6.629071 1.243496 1.862167

這不是一個非常優雅的解決方案，但它可以指出你在正確的方向。

來源

2015-09-25 18:18:22

預測間隔和數據之間的距離（stat_smooth）

回答

相關問題