我想在數據集中的多個主題上求和多個變量。我知道如何使用plyr包來做到這一點;然而,由於數據集的長度,變量數量以及我試圖做的不同滾動數量(2天,3天,4天等)的數量。我想知道是否有人用更省時的方式在dplyr中完成這項任務。使用dplyr求和多個變量
我的數據與此類似:
Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
variable.A <- rnorm(n = Day, mean = 20, sd = 5)
variable.B <- rnorm(n = Day, mean = 50, sd = 15)
variable.C <- rnorm(n = Day, mean = 100, sd = 33)
dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C)
dat
Subjects Day variable.A variable.B variable.C
1 1 1 20.17676 72.44022 56.69915
2 1 2 14.11462 46.28473 117.00864
3 1 3 15.30440 72.43752 93.17489
4 1 4 13.72422 66.76744 101.26422
5 1 5 21.97695 69.50480 102.61979
6 2 1 14.45742 32.69106 82.37268
7 2 2 33.37783 65.06782 97.17744
8 2 3 13.57833 26.37183 89.38218
9 2 4 23.01717 55.83446 147.85362
10 2 5 14.06008 32.00396 48.73060
11 3 1 14.57199 60.29746 87.07977
12 3 2 15.77413 77.04517 132.17910
13 3 3 30.05661 30.62220 171.35998
14 3 4 24.65348 53.96450 74.99875
15 3 5 26.93699 57.06393 36.81901
我想是代碼的一個例子是:
library(plyr)
library(RcppRoll)
summarize <- ddply(dat, "Subjects", mutate,
Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA),
Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA),
Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA))
Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C
1 1 1 15.324798 24.83074 137.48853 NA NA NA
2 1 2 12.112943 58.86094 86.87454 27.43774 83.69168 224.3631
3 1 3 16.179328 57.95450 68.71333 28.29227 116.81544 155.5879
4 1 4 15.319750 38.13721 79.43194 31.49908 96.09171 148.1453
5 1 5 21.791452 61.99368 134.30205 37.11120 100.13089 213.7340
6 2 1 10.937461 63.83164 95.04865 NA NA NA
7 2 2 14.642376 79.12452 107.13699 25.57984 142.95616 202.1856
8 2 3 17.519905 52.75490 100.62811 32.16228 131.87942 207.7651
9 2 4 23.190371 37.56950 179.72763 40.71028 90.32440 280.3557
10 2 5 13.729350 46.95616 72.14179 36.91972 84.52566 251.8694
11 3 1 9.609171 74.51140 130.90005 NA NA NA
12 3 2 27.542897 14.36222 133.87630 37.15207 88.87363 264.7763
13 3 3 18.750015 60.46183 130.44314 46.29291 74.82405 264.3194
14 3 4 17.461882 52.65797 176.30620 36.21190 113.11979 306.7493
15 3 5 31.244564 62.41614 78.82916 48.70645 115.07411 255.1354
此作品不夠好,但正如我所說的原始數據有很多更多的專欄,我想繼續並在所有這些變量上做3天的總和,4天的總和等。另外,我的原始數據中有一些NAs,所以也許有辦法解決這個問題?
我曾嘗試在dplyr包中使用mutate_each()函數,但似乎無法獲得正確的語法。
謝謝。
謝謝,我應該選擇哪個選項來包裝它,我會解決它? – user3585829
明白了。謝謝。將解決它。 – user3585829