2017-07-25 39 views
2

下面是我的問題與示例數據的簡化版本:用比例因變量擬合數據

每年,我在我的院子裏找到40個球。它們中有一定比例是紅色的。我想模擬一段時間內紅球的比例。

library(tidyverse) 
library(modelr) 

# generate some proportion data that changes by year 
data = tibble(
    year = 2011:2020, 
    reds = 1:10, # red balls 
    total = 40, # total number of balls 
    propRed = reds/total # proportion of red balls each year 
) 

# fit to a model 
model = glm(propRed ~ year, XXX_WHAT_GOES_HERE_XXX, data) 

# graph the model's prediction and the data 
tibble(year = 2000:2030) %>% 
    modelr::add_predictions(model, "propRed") %>% 
    ggplot() + 
    aes(y=propRed, x=year) + 
    geom_line() + 
    geom_point(data=data) 
+0

這可以是一個邏輯迴歸。使用類似'glm(cbind(reds,total - reds)〜year,family ='binomial',data = data')調用'glm'' – bouncyball

+1

不清楚你在問什麼。此外,您可能需要在交叉驗證中發佈。 – www

+0

@bouncyball:我運行了'tibble(year = 2000:2030)%> predict.glm(model,。)',它預測了不應該有的負值。 – sharoz

回答

3

這是我們可以使用邏輯迴歸,使用在式接口的cbind(successes, failures)選項glm的情況下:

model <- glm(cbind(reds, total - reds) ~ year, family = 'binomial', data = data) 

tibble(year = 2000:2030) %>% 
    mutate(propRed = predict(model, newdata = ., type = 'response')) %>% 
    ggplot() + 
    aes(y=propRed, x=year) + 
    geom_line() + 
    geom_point(data=data)