2015-08-16 139 views
1

我想要獲得一個矢量,它包含適合條件的元素之和。R:條件矢量求和矢量

values = runif(5000) 
    bin = seq(0, 0.9, by = 0.1) 
    sum(values < bin) 

我預計金額將返回我的10個值 - 這符合每每個「箱子」元素「<」條件「值」元素的總和。 但是,它只返回一個值。 如何在不使用while循環的情況下實現結果?

回答

4

我的理解是,對於bin中的每個值,要求values中的元素數小於bin。所以我想你想vapply()這裏

vapply(bin, function(x) sum(values < x), 1L) 
# [1] 0 497 1025 1501 1981 2461 2955 3446 3981 4526 

如果你想爲基準一張小桌子,你可以添加名字

v <- vapply(bin, function(x) sum(values < x), 1L) 
setNames(v, bin) 
# 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
# 0 497 1025 1501 1981 2461 2955 3446 3981 4526 
+0

在我的答案中,我是否應該得到與列cumsum中相同的結果,或者您正在執行不同的計算?謝謝。 – mpalanco

+0

不,由於使用了runif(),我們都不會有相同的結果 –

+0

對不起,我不提及我使用你的代碼設置了相同的種子。現在我明白了,你正在計算累積計數,我正在做累計和。我已經在我的回答中列入了兩個。 – mpalanco

2

cut() -constructed索引向量tapply使用似乎提供:

tapply( values, cut(values, bin), sum) 
    (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] 
25.43052 71.06897 129.99698 167.56887 222.74620 277.16395 
(0.6,0.7] (0.7,0.8] (0.8,0.9] 
332.18292 368.49341 435.01104 

雖然我猜你會想要剪切向量擴展到1.0:

bin = seq(0, 1, by = 0.1) 
tapply( values, cut(values, bin), sum) 

    (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] 
25.48087 69.87902 129.37348 169.46013 224.81064 282.22455 
(0.6,0.7] (0.7,0.8] (0.8,0.9] (0.9,1] 
335.43991 371.60885 425.66550 463.37312 

我看到我理解這個問題的方式不同於理查德。如果你想要他的結果,你可以在我的結果上使用cumsum

0

使用dplyr

set.seed(1) 
library(dplyr) 
df %>% group_by(groups) %>% 
    summarise(count = n(), sum = sum(values)) %>% 
    mutate(cumcount= cumsum(count), cumsum = cumsum(sum)) 

輸出:

 groups count  sum cumcount  cumsum 
1 (0,0.1] 537 26.43445  537 26.43445 
2 (0.1,0.2] 504 75.12241  1041 101.55686 
3 (0.2,0.3] 496 124.56939  1537 226.12625 
4 (0.3,0.4] 522 184.28862  2059 410.41487 
5 (0.4,0.5] 505 226.77295  2564 637.18782 
6 (0.5,0.6] 486 267.47094  3050 904.65876 
7 (0.6,0.7] 423 275.87466  3473 1180.53342 
8 (0.7,0.8] 478 359.65217  3951 1540.18559 
9 (0.8,0.9] 513 436.04508  4464 1976.23067 
10  NA 536 509.21853  5000 2485.44920 
3

我個人更喜歡data.tabletapplyvapplyfindInterval超過cut

set.seed(1) 
library(data.table) 
dt <- data.table(values, groups=findInterval(values, bin)) 
setkey(dt, groups) 
dt[,.(n=.N, v=sum(values)), groups][,list(cumsum(n), cumsum(v)),] 
#  V1   V2 
# 1: 537 26.43445 
# 2: 1041 101.55686 
# 3: 1537 226.12625 
# 4: 2059 410.41487 
# 5: 2564 637.18782 
# 6: 3050 904.65876 
# 7: 3473 1180.53342 
# 8: 3951 1540.18559 
# 9: 4464 1976.23067 
#10: 5000 2485.44920 

cbind(vapply(bin, function(x) sum(values < x), 1L)[-1], 
cumsum(tapply( values, cut(values, bin), sum)))  
#   [,1]  [,2] 
#(0,0.1] 537 26.43445 
#(0.1,0.2] 1041 101.55686 
#(0.2,0.3] 1537 226.12625 
#(0.3,0.4] 2059 410.41487 
#(0.4,0.5] 2564 637.18782 
#(0.5,0.6] 3050 904.65876 
#(0.6,0.7] 3473 1180.53342 
#(0.7,0.8] 3951 1540.18559 
#(0.8,0.9] 4464 1976.23067