如何使用dplyr

我的R數據幀，如找到在R數據框兩行值之間的差值：如何使用dplyr

df <- data.frame(period=rep(1:4,2), 
       farm=c(rep('A',4),rep('B',4)), 
       cumVol=c(1,5,15,31,10,12,16,24), 
       other = 1:8); 

    period farm cumVol other 
1  1 A  1  1 
2  2 A  5  2 
3  3 A  15  3 
4  4 A  31  4 
5  1 B  10  5 
6  2 B  12  6 
7  3 B  16  7 
8  4 B  24  8

如何找到在每個時期的每個農場cumVol的變化，忽略'其他'專欄？我想這樣的數據框（可選地與cumVol柱剩餘）：

period farm volume other 
1  1 A  0  1 
2  2 A  4  2 
3  3 A  10  3 
4  4 A  16  4 
5  1 B  0  5 
6  2 B  2  6 
7  3 B  4  7 
8  4 B  8  8

在實踐中可能存在許多「farm'樣柱，和許多」 other'樣（即忽略不計。）列。我希望能夠使用變量指定所有列名稱。

我正在使用dplyr軟件包。

來源

2014-02-10 Racing Tadpole

近確信這是一個重複的問題 - 嘗試：'用（DF，AVE（cumVol，農場，FUN =函數（x）c（0，diff（x））））' – thelatemail

爲什麼它重複如果OP是尋找一個dplyr而不是plyr的答案？ – Vincent

在dplyr：

require(dplyr) 
df %>% 
    group_by(farm) %>% 
    mutate(volume = cumVol - lag(cumVol, default = cumVol[1])) 

Source: local data frame [8 x 5] 
Groups: farm 

    period farm cumVol other volume 
1  1 A  1  1  0 
2  2 A  5  2  4 
3  3 A  15  3  10 
4  4 A  31  4  16 
5  1 B  10  5  0 
6  2 B  12  6  2 
7  3 B  16  7  4 
8  4 B  24  8  8

或許所需的輸出實際上應該是如下？

df %>% 
    group_by(farm) %>% 
    mutate(volume = cumVol - lag(cumVol, default = 0)) 

    period farm cumVol other volume 
1  1 A  1  1  1 
2  2 A  5  2  4 
3  3 A  15  3  10 
4  4 A  31  4  16 
5  1 B  10  5  10 
6  2 B  12  6  2 
7  3 B  16  7  4 
8  4 B  24  8  8

編輯：繼續您的意見我認爲你正在尋找安排（）。事實並非如此，最好開始一個新的問題。

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2)); 
df1 %>% 
    arrange(desc(period), desc(farm)) %>% 
    group_by(period, farm) %>% 
    summarise(cumVol=sum(cumCropVol))

編輯：跟進＃2

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2)); 
df <- df1 %>% 
    arrange(desc(period), desc(farm)) %>% 
    group_by(period, farm) %>% 
    summarise(cumVol=sum(cumCropVol)) 

ungroup(df) %>% 
    arrange(farm) %>% 
    group_by(farm) %>% 
    mutate(volume = cumVol - lag(cumVol, default = 0)) 

Source: local data frame [8 x 4] 
Groups: farm 

    period farm cumVol volume 
1  1 A  12  12 
2  2 A  20  8 
3  3 A  40  20 
4  4 A  62  22 
5  1 B  30  30 
6  2 B  34  4 
7  3 B  42  8 
8  4 B  58  16

來源

2014-02-10 01:37:22 Vincent

我相信這不是預期的輸出。卷應該是：'> DT $卷 [1] 0 4 10 16 0 2 4 8' – marbel

我更新了我的答案，以便它提供了OP所要求的。不過，我寧願在我的答案中留下替代解決方案，看起來這可能是首選輸出。 – Vincent

我同意你的觀點，@Vincent。第二個輸出看起來更合乎邏輯。 –

tapply and transform？

> transform(df, volumen=unlist(tapply(cumVol, farm, function(x) c(0, diff(x))))) 
    period farm cumVol other volumen 
A1  1 A  1  1  0 
A2  2 A  5  2  4 
A3  3 A  15  3  10 
A4  4 A  31  4  16 
B1  1 B  10  5  0 
B2  2 B  12  6  2 
B3  3 B  16  7  4 
B4  4 B  24  8  8

ave是一個更好的選擇，看到@ thelatemail的評論

with(df, ave(cumVol,farm,FUN=function(x) c(0,diff(x))))

來源

2014-02-10 00:50:30

在dplyr - 所以你不必更換來港

library(dplyr) 
df %>% 
group_by(farm)%>% 
mutate(volume = c(0,diff(cumVol))) 


    period farm cumVol other volume 
1  1 A  1  1  0 
2  2 A  5  2  4 
3  3 A  15  3  10 
4  4 A  31  4  16 
5  1 B  10  5  0 
6  2 B  12  6  2 
7  3 B  16  7  4 
8  4 B  24  8  8

來源

2014-02-10 02:12:10

好的，這很容易修復，只需用'0'替換'cumVol [1]'' –

將創建一個新的列在你的原始數據集是一個選項？

這是一個使用data.table運營商:=的選項。

require("data.table") 
DT <- data.table(df) 
DT[, volume := c(0,diff(cumVol)), by="farm"]

或

diff_2 <- function(x) c(0,diff(x)) 
DT[, volume := diff_2(cumVol), by="farm"]

輸出：

# > DT 
# period farm cumVol other volume 
# 1:  1 A  1  1  0 
# 2:  2 A  5  2  4 
# 3:  3 A  15  3  10 
# 4:  4 A  31  4  16 
# 5:  1 B  10  5  0 
# 6:  2 B  12  6  2 
# 7:  3 B  16  7  4 
# 8:  4 B  24  8  8

來源

2014-02-10 02:51:54 marbel

回答

相關問題