偏移計算累加和與0 begining爲每個組

我的樣本數據看起來像這樣：偏移計算累加和與0 begining爲每個組

>   gros id nr_oriz 
>  1: 23 1  1 
>  2: 16 1  2 
>  3: 14 1  3 
>  4: 15 1  4 
>  5: 22 1  5 
>  6: 30 1  6 
>  7: 25 2  1 
>  8: 10 2  2 
>  9: 13 2  3 
>  10: 17 2  4 
>  11: 45 2  5 
>  12: 25 4  1 
>  13: 15 4  2 
>  14: 20 4  3 
>  15: 20 4  4 
>  16: 20 4  5

其中gros是每個土層深度，id是輪廓數和nr_horiz是土層數。我需要創建兩列：頂部和底部，其中頂部是地平線的上限，底部是下限。我們已成功使用僅獲得底部值：

topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)

但我們需要以某種方式來抵消該數據爲每id並計算累計總和開始從0和不過去的價值，就像這個例子頂值：

gros id nr_oriz top bottom 
1: 23 1  1 0  23 
2: 16 1  2 23  39 
3: 14 1  3 39  53 
4: 15 1  4 53  68 
5: 22 1  5 68  90 
6: 30 1  6 90 120 
7: 25 2  1 0  25 
8: 10 2  2 25  35 
9: 13 2  3 35  48 
10: 17 2  4 48  65 
11: 45 2  5 65 110 
12: 25 4  1 0  25 
13: 15 4  2 25  40 
14: 20 4  3 40  60 
15: 20 4  4 60  80 
16: 20 4  5 80 100

是否有這一個簡單的解決方案，同時考慮到該數據庫是非常大的，我們不能做手工（如我們這個樣本中top列做）。

來源

2015-07-21 Rosca Bogdan

你可以嘗試像'庫（data.table）; setDT（topsoil）[，top：= c（0，cumsum（gros）），by = id]' – grrgrrbla

你似乎在那裏有一個'data.table'對象，所以我建議你學習正確的數據。表格語法。你可以從這裏開始（https://github.com/Rdatatable/data.table/wiki/Getting-started） –

您可以只使用ave了，但在「底部」欄，並用自定義函數：

topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))

因爲它似乎你正在使用的data.table包，您可以修改代碼以利用data.table的語法和性能。爲了計算bottom，你會簡單地做：

topsoil[, bottom := cumsum(gros), by = id]

然後計算top：

topsoil[, top := c(0L, bottom[-.N]), by = id]

或者你可以在一個單一的步驟類似於如何被@akrun's answer說明他們包裹起來。

來源

2015-07-21 13:33:51

謝謝，這個作品完美。 –

它與第一行一起工作，但我打算在未來使用這兩種解決方案。 –

你可以用data.table的開發版shift來做到這一點。說明安裝devel的版本是here

library(data.table)#v1.9.5+ 
setDT(topsoil)[, c('top', 'bottom'):= {tmp <- cumsum(gros) 
      list(top= shift(tmp, fill=0), bottom=tmp)}, by = id] 
topsoil 
# gros id nr_oriz top bottom 
# 1: 23 1  1 0  23 
# 2: 16 1  2 23  39 
# 3: 14 1  3 39  53 
# 4: 15 1  4 53  68 
# 5: 22 1  5 68  90 
# 6: 30 1  6 90 120 
# 7: 25 2  1 0  25 
# 8: 10 2  2 25  35 
# 9: 13 2  3 35  48 
#10: 17 2  4 48  65 
#11: 45 2  5 65 110 
#12: 25 4  1 0  25 
#13: 15 4  2 25  40 
#14: 20 4  3 40  60 
#15: 20 4  4 60  80 
#16: 20 4  5 80 100

來源

2015-07-21 13:38:42 akrun

library(dplyr) 
df %>% group_by(id) %>% 
     mutate(bottom = cumsum(gros), top = lag(bottom)) %>% 
     replace(is.na(.), 0)

來源

2015-07-21 14:39:04

偏移計算累加和與0 begining爲每個組

回答

相關問題