即使是簡單的邏輯從@eddi(下評論)減少環島一個如下圖所示:
dt[, incr := diff(c(0, value)), by = group][, ans := cumsum(incr)]
不知道如何延伸到更多的羣體,但這裏的一對的示例數據與3組:
# I hope I got the desired output correctly
require(data.table)
dt = data.table(group = c('a','b','c','a','a','b','c','a'),
value = c(10, 5, 20, 25, 15, 15, 30, 10),
desired = c(10, 15, 35, 50, 40, 50, 60, 55))
添加rleid
:
dt[, id := rleid(group)]
提取最後一行每個group, id
:
last = dt[, .(value=value[.N]), by=.(group, id)]
last
將有獨特的id
。現在想法是獲得每個id
的增量,然後加入+更新回來。
last = last[, incr := value - shift(value, type="lag", fill=0L), by=group
][, incr := cumsum(incr)-value][]
現在加入+更新:
dt[last, ans := value + i.incr, on="id"][, id := NULL][]
# group value desired ans
# 1: a 10 10 10
# 2: b 5 15 15
# 3: c 20 35 35
# 4: a 25 50 50
# 5: a 15 40 40
# 6: b 15 50 50
# 7: c 30 60 60
# 8: a 10 55 55
我還不知道在哪裏/如果突破將看..它仔細了。我立即寫下來,以便更多的目光。
上500組比較10,000行與大衛的解決方案:
require(data.table)
set.seed(45L)
groups = apply(matrix(sample(letters, 500L*10L, TRUE), ncol=10L), 1L, paste, collapse="")
uniqueN(groups) # 500L
N = 1e4L
dt = data.table(group=sample(groups, N, TRUE), value = sample(100L, N, TRUE))
arun <- function(dt) {
dt[, id := rleid(group)]
last = dt[, .(value=value[.N]), by=.(group, id)]
last = last[, incr := value - shift(value, type="lag", fill=0L), by=group
][, incr := cumsum(incr)-value][]
dt[last, ans := value + i.incr, on="id"][, id := NULL][]
dt$ans
}
david <- function(dt) {
dt[, indx := .I]
res <- dcast(dt, indx ~ group)
for (j in names(res)[-1L])
set(res, j = j, value = res[!is.na(res[[j]])][res, on = "indx", roll = TRUE][[j]])
rowSums(as.matrix(res)[, -1], na.rm = TRUE)
}
system.time(ans1 <- arun(dt)) ## 0.024s
system.time(ans2 <- david(dt)) ## 38.97s
identical(ans1, as.integer(ans2))
# [1] TRUE
這是偉大的,謝謝我!關於連接和rleid有點困惑 - 不僅僅是'dt [,incr:= diff(c(0,value)),by = group] [,ans:= cumsum(incr)]'work?(I'我不知道是否我錯過了一些邏輯) – eddi
哦,是的,我認爲那會奏效!這種迂迴的方式歸結爲你的單線。 – Arun
哇,這是一個不錯的單線... – andrew