更有效的方法是首先計算您的權重的按行累積和,然後在0
和1
之間繪製一個數字,然後查看它在哪個累積和中的位置。這樣,您只需要執行一個致電runif
以獲取您的隨機數據,而使用其他方法致電1000
致電。
Weight <- matrix(sample(1:100, 1000 * 4, TRUE), 1000, 4)
x <- runif(nrow(Weight))
cumul.w <- Weight %*% upper.tri(diag(ncol(Weight)), diag = TRUE)/rowSums(Weight)
i <- rowSums(x > cumul.w) + 1L
selection <- y[i]
還要注意我怎樣通過由三角矩陣相乘,而不是使用更慢apply(Weight, 1, cumsum)
計算出的累計總和。一切都是矢量化的,所以它應該比使用apply
或for
循環更快。
與apply
和for
基準比較:
f_runif <- function(Weight, y) {
x <- runif(nrow(Weight))
cumul.w <- Weight %*% upper.tri(diag(ncol(Weight)), diag = TRUE)/
rowSums(Weight)
i <- rowSums(x > cumul.w) + 1L
y[i]
}
f_for <- function(Weight, y) {
selection <- rep(NA, nrow(Weight))
for(i in 1:nrow(Weight)){
selection[i] <- sample(y, 1, prob=Weight[i,])
}
}
f_apply <- function(Weight, y) {
apply(Weight, 1, function(w)sample(y, 1, prob=w))
}
y <- c("a", "b", "c", "d")
Weight <- matrix(sample(1:100, 1000 * 4, TRUE), 1000, 4)
library(microbenchmark)
microbenchmark(f_runif(Weight, y),
f_for (Weight, y),
f_apply(Weight, y))
# Unit: microseconds
# expr min lq median uq max neval
# f_runif(Weight, y) 223.635 231.111 274.531 281.2165 1443.208 100
# f_for(Weight, y) 10220.674 11238.660 11574.039 11917.1610 14583.028 100
# f_apply(Weight, y) 9006.974 10016.747 10509.150 10879.9245 27060.189 100
你如何定義 「有效」? – flodel