2015-10-29 57 views
1

我想在數據集中的多個主題上求和多個變量。我知道如何使用plyr包來做到這一點;然而,由於數據集的長度,變量數量以及我試圖做的不同滾動數量(2天,3天,4天等)的數量。我想知道是否有人用更省時的方式在dplyr中完成這項任務。使用dplyr求和多個變量

我的數據與此類似:

Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3) 
Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5) 
variable.A <- rnorm(n = Day, mean = 20, sd = 5) 
variable.B <- rnorm(n = Day, mean = 50, sd = 15) 
variable.C <- rnorm(n = Day, mean = 100, sd = 33) 
dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C) 
dat 



    Subjects Day variable.A variable.B variable.C 
1   1 1 20.17676 72.44022 56.69915 
2   1 2 14.11462 46.28473 117.00864 
3   1 3 15.30440 72.43752 93.17489 
4   1 4 13.72422 66.76744 101.26422 
5   1 5 21.97695 69.50480 102.61979 
6   2 1 14.45742 32.69106 82.37268 
7   2 2 33.37783 65.06782 97.17744 
8   2 3 13.57833 26.37183 89.38218 
9   2 4 23.01717 55.83446 147.85362 
10  2 5 14.06008 32.00396 48.73060 
11  3 1 14.57199 60.29746 87.07977 
12  3 2 15.77413 77.04517 132.17910 
13  3 3 30.05661 30.62220 171.35998 
14  3 4 24.65348 53.96450 74.99875 
15  3 5 26.93699 57.06393 36.81901 

我想是代碼的一個例子是:

library(plyr) 
library(RcppRoll) 
summarize <- ddply(dat, "Subjects", mutate, 
    Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA), 
    Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA), 
    Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA)) 

    Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C 
1   1 1 15.324798 24.83074 137.48853    NA    NA    NA 
2   1 2 12.112943 58.86094 86.87454  27.43774  83.69168  224.3631 
3   1 3 16.179328 57.95450 68.71333  28.29227  116.81544  155.5879 
4   1 4 15.319750 38.13721 79.43194  31.49908  96.09171  148.1453 
5   1 5 21.791452 61.99368 134.30205  37.11120  100.13089  213.7340 
6   2 1 10.937461 63.83164 95.04865    NA    NA    NA 
7   2 2 14.642376 79.12452 107.13699  25.57984  142.95616  202.1856 
8   2 3 17.519905 52.75490 100.62811  32.16228  131.87942  207.7651 
9   2 4 23.190371 37.56950 179.72763  40.71028  90.32440  280.3557 
10  2 5 13.729350 46.95616 72.14179  36.91972  84.52566  251.8694 
11  3 1 9.609171 74.51140 130.90005    NA    NA    NA 
12  3 2 27.542897 14.36222 133.87630  37.15207  88.87363  264.7763 
13  3 3 18.750015 60.46183 130.44314  46.29291  74.82405  264.3194 
14  3 4 17.461882 52.65797 176.30620  36.21190  113.11979  306.7493 
15  3 5 31.244564 62.41614 78.82916  48.70645  115.07411  255.1354 

此作品不夠好,但正如我所說的原始數據有很多更多的專欄,我想繼續並在所有這些變量上做3天的總和,4天的總和等。另外,我的原始數據中有一些NAs,所以也許有辦法解決這個問題?

我曾嘗試在dplyr包中使用mutate_each()函數,但似乎無法獲得正確的語法。

謝謝。

+0

謝謝,我應該選擇哪個選項來包裝它,我會解決它? – user3585829

+1

明白了。謝謝。將解決它。 – user3585829

回答

2

這裏的dplyr版本:

library(dplyr) 
library(RcppRoll) 
dat %>% group_by(Subjects) %>% 
     mutate_each(funs(roll_sum(., 2, align = "right", fill=NA)), -Subjects, -Day) 
+1

還有兩件小事情:你不需要'' - 對象',這會覆蓋舊的cols,與上面的plyr結果相反。 – Frank

+0

看起來像開發人員意識到後者的問題,但沒有提供解決方法。 https://github.com/hadley/dplyr/issues/712我能想到的最好的是'dat%>%group_by(主題)%>%mutate_each(funs(「。」=「(」,roll = roll_sum( 。,2,align =「right」,fill = NA)),-Day)' – Frank

+0

很好,謝謝,如果我在原始數據中有NA,我會把參數na.rm = T放入roll_sum ()函數? – user3585829