首先,一些樣本數據:
## Sample data
nMen <- 50
nWomen <- 60
set.seed(124)
mydata <- data.frame(SEX = rep(c("female", "male"), times = c(nWomen, nMen)),
myValue = rnorm(nMen + nWomen), ID = seq_len(nMen + nWomen))
然後,計算出你想每個樣本中男性和女性的數量 - 這必須是整數
## Number of women and men for the sampling
nSampW <- (nWomen + nMen) * 0.9
nSampM <- (nWomen + nMen) * 0.1
## These should be integer (the following should be TRUE)
nSampW %% 1 ==0
nSampM %% 1 ==0
然後設置你的結果向量 - 下面創建了空間名單200個樣本
## Set up results list
mySamp <- vector(mode = "list", length = 200)
然後循環,取樣按性別劃分,從指標計算以上男性和女性人數
## The loop
for(i in seq_along(mySamp)) {
## Get indices by SEX
idxW <- which(mydata$SEX == "female")
idxM <- which(mydata$SEX == "male")
## Sample corresponding number of rows from those indexes with replacement
tempW <- mydata[sample(idxW, nSampW, replace = TRUE), ]
tempM <- mydata[sample(idxM, nSampM, replace = TRUE), ]
## rbind back together and assign
mySamp[[i]] <- rbind(tempW, tempM)
}
然後檢查,看看比例是否正確
# sapply(mySamp[1:10], function(x) prop.table(table(x$SEX)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# female 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
# male 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
什麼'cox'爲什麼不'nrow(數據)'?什麼是'smpl'?它是一個正確分配的列表嗎?你爲什麼不使用'smpl [[i]]'?不要說只是「它不工作」,而是指定你遇到的問題(錯誤?意外結果?警告?) – nicola
嗨!對不起,在原始帖子中添加了缺少的信息。該代碼繪製隨機樣本,但不在指定比例內。當我嘗試循環200次以創建200個數據幀時,它不會執行它...(我原始數據集的名稱是「cox」 - 複製粘貼錯誤) – user3018739
您應該在循環:'smpl <-vector(「list」,200)'和循環內部使用'smpl [[i]] < - '與雙方括號。你的意思是「不保持比例」?由於採樣方差,獲得的樣本不完全是180-20是正常的。 – nicola