2016-03-14 74 views
5

我試圖將新列添加到data.table,其中行中的值取決於行中值的相對關係。更確切地說,如果連續有一個X值,我想知道在X-30範圍內同一列(和組)中還有多少個其他值。計算每個組的data.table的窗口中的值的數量

也就是說,給出這樣的:

DT<-data.table(
X = c(1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1), 
Y = c(100, 101, 133, 134, 150, 156, 190, 200, 201, 230, 233, 234), 
Z = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)) 

我想獲得一個新的列,其值:

N <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 2) 

我試過以下,但我不明白的結果我可以使用:

DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)] 

任何想法如何做到這一點? (?)

回答

6

這也許可以用滾動來實現連接,但這裏是一個foverlaps替代現在

DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval 
setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps) 
res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch 
setorder(DT, indx) # go back to the original order 
DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols 
DT 
#  X Y Z N 
# 1: 1 100 1 0 
# 2: 2 101 2 0 
# 3: 2 133 3 0 
# 4: 1 134 4 0 
# 5: 1 150 5 1 
# 6: 2 156 6 1 
# 7: 1 190 7 0 
# 8: 2 200 8 0 
# 9: 2 201 9 1 
# 10: 1 230 10 0 
# 11: 1 233 11 1 
# 12: 1 234 12 2 

或者,使用的foverlapswhich=TRUE選項使重疊合並較小:

# as above 
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] 
setkey(DT, X, Y2, Y) 

# using which=TRUE: 
res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid] 
DT[res$xid, N := res$N] 
setorder(DT, indx) 
DT[, c("Y2","indx") := NULL] 
4

這裏的另一種方式:

DT[order(Y), N := 0:(.N-1) - findInterval(Y - 30, Y), by = X] 

all.equal(DT$N,N) # TRUE 
相關問題