2017-04-04 35 views
0

我是學生計算機科學專業的學生和新手R用戶。如何在R中減去分組後續數據

以下是我的數據框。

set.seed(1234) 
df <- data.frame(
        sex = rep(c('M','F'), 10), 
        profession = rep(c('Doctor','Lawyer'), each = 5), 
        pariticpant = rep(1:10, 2), 
        x = runif(20, 1, 10), 
        y = runif(20, 1, 10)) 

enter image description here

我想找到的每一天,每一個參與者在x和y的差異。這將創建一個10行數據框。

dday將取代day,因爲這些值將是日期之間的差異。

dday sex profession participant dx dy 
0-1 M Doctor  1   5.22 1.26 
. 
. 
. 

R會執行此功能嗎?

+1

你想做什麼?什麼是所需的輸出(實際使用數字,並使用'set.seed()',以便隨機數是[reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great -r-reproducible-example)。那天從哪裏來?這不在'df'範例中。 – MrFlick

+0

@MrFlick此文章已被編輯。 –

回答

1

看來,天塔從data.frame失蹤,但在畫面

library(dplyr) 

set.seed(1234) 
df <- data.frame(day = rep(c(0, 1), each = 10), 
      sex = rep(c('M', 'F'), 10), 
      profession = rep(c('Doctor', 'Lawyer'), each = 5), 
      pariticpant = rep(1:10, 2), 
      x = runif(20, 1, 10), 
      y = runif(20, 1, 10)) 

df %>% 
    group_by(pariticpant) %>% 
    mutate(day = paste0(lag(day), "-", day), dx = x - lag(x), dy = y - lag(y)) %>% 
    select(-x, -y) %>% 
    filter(!is.na(dx)) 

Source: local data frame [10 x 8] 
Groups: pariticpant [10] 

    day sex profession pariticpant   dx   dy 
    <chr> <fctr>  <fctr>  <int>  <dbl>  <dbl> 
1 0-1  M  Doctor   1 5.2189909 1.2553112 
2 0-1  F  Doctor   2 -0.6959211 -0.3375603 
3 0-1  M  Doctor   3 -2.9388703 1.3106358 
4 0-1  F  Doctor   4 2.7004864 4.2057986 
5 0-1  M  Doctor   5 -5.1173959 -0.3393300 
6 0-1  F  Lawyer   6 1.7728652 -0.4583513 
7 0-1  M  Lawyer   7 2.4905478 -2.9200456 
8 0-1  F  Lawyer   8 0.3084325 -5.9026351 
9 0-1  M  Lawyer   9 -4.3142487 1.4472483 
10 0-1  F  Lawyer   10 -2.5382271 6.8542387 
+0

謝謝! 您是否主要從'df'解釋代碼並且'groupby'? –

+1

也許我可以給你一些提示來幫助你,mutate命令只是建立一個新的列dx和dy,lag命令只是移動x個向量,例如'x < - c(1,2, 3,4) lag(x)''會給你'[1] NA 1 2 3',這樣''x-lag(x)'是沒有其他東西是減去向量x的後續元素。在正確的方向? – Umberto

0

包括你也可以這樣做只是這樣

set.seed (1) 


df <- data.frame(
day = rep (c(0,1),c(10,10)), 
sex = rep(c('M','F'), 10), 
profession = rep(c('Doctor','Lawyer'), each = 5), 
participant = rep(1:10, 2), 
x = runif(20, 1, 10), 
y = runif(20, 1, 10)) 

現在,我們需要彙集起來的性別,職業和參與者,然後編寫一個函數,返回x和y之差的兩列。請記住,R中的函數返回最後一個計算的值(在本例中爲最後的數據框)。

ddply(df, c("sex", "profession", "participant"), 
    function(dat) { 
    ddx = 2*dat$x[[1]]-dat$x[[2]] 
    ddy = 2*dat$y[[1]]-dat$y[[2]] 
    data.frame (dx = ddx, dy = ddy) 
    }) 

輸出的(不重新排序)

sex profession participant   dx   dy 
1 F  Doctor   2 3.9572263 -0.9337529 
2 F  Doctor   4 -0.6294785 3.6342897 
3 F  Lawyer   6 1.6292118 -1.7344123 
4 F  Lawyer   8 0.7850676 1.2878669 
5 F  Lawyer   10 2.1418901 0.3098424 
6 M  Doctor   1 -3.1910030 1.8730386 
7 M  Doctor   3 -4.1488559 5.5640663 
8 M  Doctor   5 0.9190749 -0.2446371 
9 M  Lawyer   7 -3.2924210 5.1612642 
10 M  Lawyer   9 0.0743912 -5.4104425 

希望這有助於你。因爲寫起來容易理解,所以我找到了ddply函數。