我一直在使用SAS很長一段時間,現在我想翻譯我的代碼在R.我需要幫助做到以下幾點:蒙特卡羅模擬,引導和迴歸中的R
- 生成若干自舉樣品
- 運行在每個樣品
- 存儲在一個新的數據集的參數的線性迴歸模型的複製樣品
我編輯的代碼,更清晰。 我用了很多for循環,我知道並不總是推薦。這個過程非常緩慢
是否有代碼/軟件包(例如,應用家庭功能,「caret」軟件包)可以使這個非常乾淨的高效/快,尤其是如samplesize * bootsample> 1000萬
任何幫助將不勝感激。
samplesize <- 200
bootsize<- 500
myseed <- 123
#generating a fake dataset
id=1:n
set.seed(myseed)
x <- rnorm(samplesize, 5, 5)
y <- rnorm(samplesize, 2 + 0.4*x, 0.5)
data <- data.frame(id, x, y)
head(data)
id x y
1 1 2.197622 3.978454
2 2 3.849113 4.195852
3 3 12.793542 6.984844
4 4 5.352542 4.412614
5 5 5.646439 4.051405
6 6 13.575325 7.192007
# generate bootstrap samples
bootstrap <- function(nbootsamples, data, seed) {
bootdata <- data.frame() #to initialize it
set.seed(seed)
for (i in 1:nbootsamples) {
replicate <- i
bootstrapIndex <- sample(1:nrow(data), replace = TRUE)
datatemp <- data[bootstrapIndex, ]
tempall <- cbind(replicate, datatemp)
bootdata <- rbind(bootdata, tempall)
}
return(bootdata)
}
bootdata <- bootstrap(nbootsamples=bootsize, data=data, seed=myseed)
bootdata <- dplyr::arrange(bootdata, replicate, id)
head(bootdata)
#The data should look like this
replicate id x y
1 1 1 2.197622 3.978454
2 1 3 12.793542 6.984844
3 1 5 5.646439 4.051405
4 1 9 1.565736 3.451748
5 1 10 2.771690 3.081662
6 1 10 2.771690 3.081662
#Model-fitting and saving coefficient and means
modelFitting <- function(y, x, data) {
modeltemp <- glm(y ~ x,
data = data,
family = gaussian('identity'))
Inty <- coef(modeltemp)["(Intercept)"]
betaX <- coef(modeltemp)["x"]
sdy <- sd(residuals.glm(modeltemp))
data.frame(Inty, betaX, sdy, row.names = NULL)
}
saveParameters <- function(nbootsamples, data, seed) {
parameters <- data.frame() #to initialize it
for (i in 1:length(unique(data$replicate))) {
replicate <- i
datai <- data[ which(data$replicate==i),]
datatemp <- modelFitting(y, x,data=datai)
meandata <- data.frame(Pr_X=mean(datai$x))
tempall <- cbind(replicate, datatemp, meandata)
parameters <- rbind(parameters, tempall)
}
return(parameters)
}
parameters <- saveParameters(nbootsamples=bootsize, data=bootdata, seed=myseed)
head(parameters)
#Ultimately all I want is my final dataset to look like the following
replicate Inty betaX sdy Pr_X
1 1 2.135529 0.3851757 0.5162728 4.995836
2 2 1.957152 0.4094682 0.5071635 4.835884
3 3 2.044257 0.3989742 0.4734178 5.111185
4 4 2.093452 0.3861861 0.4921470 4.741299
5 5 2.017825 0.4037699 0.5240363 4.931793
6 6 2.026952 0.3979731 0.4898346 5.502320
支持對於連續因變量,不應該家族=高斯(「身份」)? –
謝謝你Len Greski。我會修復它 – SimRock