2013-01-12 100 views
4

我想從二元選擇glm對經驗概率使用來自泰坦尼克號的數據繪製模型預測。爲了顯示不同階級和性別之間的差異,我正在使用刻面,但我有兩件事情我無法弄清楚。首先是我想限制黃土曲線在0到1之間,但如果我在曲線的末端添加ylim(c(0,1))選項,則如果黃土曲線的一邊在外面,則黃土曲線周圍的條帶會被切斷界限。我想要做的第二件事是從每個facet的最小x值(glm的預測概率)到最大x值(在相同facet內)和y = 1中畫一條線以便顯示glm預測概率。黃土和glm密謀與ggplot2

loess-titanic

#info on this data http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt 
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.sav')) 
titanic <- titanic3[ ,-c(3,8:14)]; rm(titanic3) 
titanic <- na.omit(titanic) #probably missing completely at random 
titanic$age <- as.numeric(titanic$age) 
titanic$sibsp <- as.integer(titanic$sibsp) 
titanic$survived <- as.integer(titanic$survived) 

training.df <- titanic[sample(nrow(titanic), nrow(titanic)/2), ] 
validation.df <- titanic[!(row.names(titanic) %in% row.names(training.df)), ] 


glm.fit <- glm(survived ~ sex + sibsp + age + I(age^2) + factor(pclass) + sibsp:sex, 
       family = binomial(link = "probit"), data = training.df) 

glm.predict <- predict(glm.fit, newdata = validation.df, se.fit = TRUE, type = "response") 

plot.data <- data.frame(mean = glm.predict$fit, response = validation.df$survived, 
         class = validation.df$pclass, sex = validation.df$sex) 

require(ggplot2) 
ggplot(data = plot.data, aes(x = as.numeric(mean), y = as.integer(response))) + geom_point() + 
     stat_smooth(method = "loess", formula = y ~ x) + 
     facet_wrap(~ class + sex, scale = "free") + ylim(c(0,1)) + 
     xlab("Predicted Probability of Survival") + ylab("Empirical Survival Rate") 

回答

2

回答你的第一個問題是使用coord_cartesian(ylim=c(0,1)),而不是ylim(0,1);這是一個適度的常見問題。

關於第二個問題,有可能是一個辦法ggplot內做到這一點,但它是更容易爲我總結外部數據:通過消除一些默認值略有

g0 <- ggplot(data = plot.data, aes(x = mean, y = response)) + geom_point() + 
      stat_smooth(method = "loess") + 
      facet_wrap(~ class + sex, scale = "free") + 
      coord_cartesian(ylim=c(0,1))+ 
      labs(x="Predicted Probability of Survival", 
        y="Empirical Survival Rate") 

(我縮短你的代碼,使用labs。)

ss <- ddply(plot.data,c("class","sex"),summarise,minx=min(mean),maxx=max(mean)) 
g0 + geom_segment(data=ss,aes(x=minx,y=minx,xend=maxx,yend=maxx), 
        colour="red",alpha=0.5) 
+0

謝謝,如果我在'平均'爲'x'的子。非常感謝。 – Zach

+0

哎呀,編輯... –

+0

也在段部分(可能是由於我的不好解釋)段的y座標應該匹配x座標,而不是從0到1. – Zach