短的答案:從Hmisc
包
龍答案
實施例嘗試cut2
:分割dat
,其是1000個唯一值,到100等於10組。
不起作用:
# dummy data
set.seed(321)
dat <- rexp(1000)
# all unique values
length(unique(dat))
[1] 1000
cut
產生100個水平
init_res <- cut(dat, 100)
length(unique(levels(init_res)))
[1] 100
但不會將數據分成相等大小的組
init_grps <- split(dat, cut(dat, 100))
table(unlist(lapply(init_grps, length)))
0 1 2 3 4 5 6 7 9 10 11 13 15 17 18 19 22 23 24 25 27 37 38 44 47 50 63 71 72 77
42 9 8 4 1 3 1 3 2 1 2 1 1 1 2 1 1 1 2 2 2 1 1 1 1 1 1 2 1 1
作品與Hmisc :: CUT2
cut2
d ivides矢量成等長的組,根據需要
require(Hmisc)
final_grps <- split(dat, cut2(dat, g=100))
table(unlist(lapply(final_grps, length)))
10
100
如果你願意,你可以存儲在數據幀的結果,例如
foobar <- do.call(rbind, final_grps)
head(foobar)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[0.000611,0.00514) 0.004345915 0.002192086 0.004849693 0.002911516 0.003421753 0.003159641 0.004855366 0.0006111574
[0.005137,0.01392) 0.009178133 0.005137309 0.008347482 0.007072484 0.008732725 0.009379002 0.008818794 0.0110489833
[0.013924,0.02004) 0.014283326 0.014356782 0.013923721 0.014290554 0.014895342 0.017992638 0.015608931 0.0173707930
[0.020041,0.03945) 0.023047527 0.020437743 0.026353839 0.036159321 0.024371834 0.026629812 0.020793695 0.0214221779
[0.039450,0.05912) 0.043379064 0.039450453 0.050806316 0.054778805 0.040093806 0.047228050 0.055058519 0.0446634954
[0.059124,0.07362) 0.069671018 0.059124220 0.063242564 0.064505875 0.072344089 0.067196661 0.065575249 0.0634142853
[,9] [,10]
[0.000611,0.00514) 0.002524557 0.003155055
[0.005137,0.01392) 0.008287758 0.011683228
[0.013924,0.02004) 0.018537469 0.014847937
[0.020041,0.03945) 0.026233400 0.020040981
[0.039450,0.05912) 0.041310471 0.058449603
[0.059124,0.07362) 0.063608022 0.066316782
希望這有助於
您的數據可能不是均勻分佈的,所以當你像這樣切割時,一些分箱將會有0個值。你可以做'cut(dat,quantile(dat,probs = seq(0,1,1/1024))' – jenesaisquoi
或'gtools :: quantcut' –
也許更好的辦法,'split(dat [order(dat)] ,c(0,seq(length(dat)))%/%2)',並用每個bin所需的數值替換2 – jenesaisquoi