2016-06-19 64 views
0

我的數據結構如下:如何在dplyr中正確設置排列和分組?

Athletes = c("Gus", "Hudson", "Bobby", "Tom") 
set.seed(400) 
RawData <- data.frame(Name = rep((Athletes), each = 400), 
           Quarter = as.numeric(rep(1:4, each = 100)), 
           Sample = as.numeric(rep(1:100, each = 1)), 
           X = runif(400, 26, 30), 
           Y = runif(400, 12, 16)) 

祝在每個SampleQuarter來計算位移,每個X和Y對,對於每個Athlete。要做到這一點,我已經安裝了下面的代碼:

DistanceOutput <- RawData %>% 
    arrange(Name, Sample, Quarter) %>% 
    group_by(Name, Quarter) %>% 
    mutate(lagX = lag(X, order_by=Sample), lagY = lag(Y, order_by=Sample)) %>% 
    rowwise() %>% 
    mutate(Distance = dist(matrix(c(X,Y,lagX,lagY),nrow=2,byrow=TRUE))) %>% 
    select(-lagX, -lagY) 

然而,這會返回一個data.frame該結構如下:

> head(DistanceOutput, n=10) 
Source: local data frame [10 x 6] 

    Name Quarter Sample  X  Y Distance 
    (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) 
1 Bobby  1  1 27.82656 13.85830  NA 
2 Bobby  2  1 27.37298 15.67940  NA 
3 Bobby  3  1 28.74274 12.25703  NA 
4 Bobby  4  1 26.63564 13.07924  NA 
5 Bobby  1  2 26.32446 12.64722 1.929508 
6 Bobby  2  2 26.88957 14.52096  NA 
7 Bobby  3  2 27.53932 15.57959 3.533781 
8 Bobby  4  2 28.03031 12.70763 1.443328 
9 Bobby  1  3 29.68239 13.82739 3.559287 
10 Bobby  2  3 29.43869 12.60890 3.186531 

相反,我寧願我的數據是設置如下:

> head(DistanceOutput, n=3) 
    Source: local data frame [10 x 6] 

     Name Quarter Sample  X  Y Distance 
     (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) 
    1 Bobby  1  1 27.82656 13.85830  NA 
    2 Bobby  1  2 26.32446 12.64722 1.929508 
    3 Bobby  1  3 29.68239 13.82739 3.559287 

我怎麼正確設置GROUP_BY並安排內dplyr陳述,以正確反映我期望的輸出?

謝謝。

+1

道歉,謝謝你通知我。 – user2716568

回答

0

我想這是我沒有包含`set.seed`訂單問題

DistanceOutput %>% 
     arrange(Name, Quarter, Sample) %>% 
     head(3) 
# Name Quarter Sample  X  Y Distance 
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1 Bobby  1  1 28.40293 15.40195  NA 
#2 Bobby  1  2 26.33676 14.32382 2.330544 
#3 Bobby  1  3 28.60779 14.67457 2.297951 
+0

當我編輯代碼以反映您的答案時,第一個運動員在整個第一季度都有NA。 '來源:本地數據幀[5×6] 名稱區樣品XY距離 (FCTR)(DBL)(DBL)(DBL)(DBL)(DBL) 1波比1 1 26.59989 14.13808 NA 2波比1 2 26.74157 15.04485不適用 3 Bobby 1 3 28.92326 12.59923不適用 4 Bobby 1 4 27.45838 14.68891 NA 5 Bobby 1 5 29.71846 15.93510 NA' – user2716568

+1

您也可能希望在'Quarter'之後的'Arrange'中包含'Sample'以確保訂單。 – toni057

+0

@ user2716568由於預期的輸出在set.seed後沒有改變,所以不清楚。 – akrun