編輯:專業全部清理過道。您可能會看到cut
。默認情況下,cut
使左開放和右閉合間隔,並可以使用適當的參數(right
)進行更改。要使用你的例子:
x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
cutVec <- c(vec, max(x)) # for cut, range of vec should cover all of x
現在創建四個功能應該做同樣的事情:二是從OP,一個來自喬希·奧布萊恩,然後cut
。 cut
的兩個參數已從默認設置更改爲:include.lowest = TRUE
將爲最小(最左邊)的間隔創建兩側關閉的間隔。 labels = FALSE
將導致cut
簡單地返回垃圾箱的整數值,而不是創建一個因子,否則它會這樣做。
findInterval.rightClosed <- function(x, vec, ...) {
fi <- findInterval(x, vec, ...)
fi - (x==vec[fi])
}
findInterval.rightClosed2 <- function(x, vec, ...) {
length(vec) - findInterval(-x, -rev(vec), ...)
}
cutFun <- function(x, vec){
cut(x, vec, include.lowest = TRUE, labels = FALSE)
}
# The body of fiFun is a contribution by Josh O'Brien that got fed to the ether.
fiFun <- function(x, vec){
xxFI <- findInterval(x, vec * (1 + .Machine$double.eps))
}
所有函數都返回相同的結果嗎?對。 (注意對於cutFun
使用cutVec
)
mapply(identical, list(findInterval.rightClosed(x, vec)),
list(findInterval.rightClosed2(x, vec), cutFun(x, cutVec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
現在更苛刻的載體斌:
x <- rpois(2e6, 10)
vec <- c(-Inf, quantile(x, seq(.2, 1, .2)))
測試相同(注意使用unname
)
mapply(identical, list(unname(findInterval.rightClosed(x, vec))),
list(findInterval.rightClosed2(x, vec), cutFun(x, vec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
和Benchmark是否:
library(microbenchmark)
microbenchmark(findInterval.rightClosed(x, vec), findInterval.rightClosed2(x, vec),
cutFun(x, vec), fiFun(x, vec), times = 50)
# Unit: milliseconds
# expr min lq median uq max
# 1 cutFun(x, vec) 35.46261 35.63435 35.81233 36.68036 53.52078
# 2 fiFun(x, vec) 51.30158 51.69391 52.24277 53.69253 67.09433
# 3 findInterval.rightClosed(x, vec) 124.57110 133.99315 142.06567 155.68592 176.43291
# 4 findInterval.rightClosed2(x, vec) 79.81685 82.01025 86.20182 95.65368 108.51624
從這次跑步看,cut
似乎是最快的。
你怎麼看待:'findInterval(X,C(-Inf,頭(VEC,-1)))'? – sgibb
@sgibb似乎並沒有做到這一點,我添加了一個例子,你的結果並不一樣。 –
我在這裏有點困惑,但'findInterval(x-1,vec)'做你正在尋找什麼? – thelatemail