2017-03-09 48 views
2

我有以下data.table與感興趣的變量x。我想創建另一個變量,指示從0到1的x跳轉,這意味着該變量在某年之後一直爲0,在此之後的所有年份中爲1。這應該由id_d完成。使用data.table識別從0跳躍到1

是否有一個簡單的data.table方法來做到這一點?

原始data.table:

fullDat <- data.table(id_d = rep(letters[1:3], each=12), 
        year=rep(1:12, 3), 
        x = c(rep(0, 5), rep(1, 7), 0,1,0,1,2,2,4, rep(5,5), 1, rep(0, 3), rep(1, 8))) 

    id_d year x 
    1: a 1 0 
    2: a 2 0 
    3: a 3 0 
    4: a 4 0 
    5: a 5 0 
    6: a 6 1 
    7: a 7 1 
    8: a 8 1 
    9: a 9 1 
10: a 10 1 
11: a 11 1 
12: a 12 1 
13: b 1 0 
14: b 2 1 
15: b 3 0 
16: b 4 1 
17: b 5 2 
18: b 6 2 
19: b 7 4 
20: b 8 5 
21: b 9 5 
22: b 10 5 
23: b 11 5 
24: b 12 5 
25: c 1 1 
26: c 2 0 
27: c 3 0 
28: c 4 0 
29: c 5 1 
30: c 6 1 
31: c 7 1 
32: c 8 1 
33: c 9 1 
34: c 10 1 
35: c 11 1 
36: c 12 1 
id_d year x 

的結果應該是什麼樣子:

id_d year x jump 
    1: a 1 0 0 
    2: a 2 0 0 
    3: a 3 0 0 
    4: a 4 0 0 
    5: a 5 0 0 
    6: a 6 1 1 
    7: a 7 1 0 
    8: a 8 1 0 
    9: a 9 1 0 
10: a 10 1 0 
11: a 11 1 0 
12: a 12 1 0 
13: b 1 0 0 
14: b 2 1 0 
15: b 3 0 0 
16: b 4 1 0 
17: b 5 2 0 
18: b 6 2 0 
19: b 7 4 0 
20: b 8 5 0 
21: b 9 5 0 
22: b 10 5 0 
23: b 11 5 0 
24: b 12 5 0 
25: c 1 1 0 
26: c 2 0 0 
27: c 3 0 0 
28: c 4 0 0 
29: c 5 1 0 
30: c 6 1 0 
31: c 7 1 0 
32: c 8 1 0 
33: c 9 1 0 
34: c 10 1 0 
35: c 11 1 0 
36: c 12 1 0 
id_d year x jump 

回答

2

我們可以做

fullDat[, jump := {i1 <- which.max(x) 
     if(all(x[i1:.N]==1)) replace(rep(0, .N), i1, 1) else 0}, 
      id_d] 
fullDat 
# id_d year x jump 
# 1: a 1 0 0 
# 2: a 2 0 0 
# 3: a 3 0 0 
# 4: a 4 0 0 
# 5: a 5 0 0 
# 6: a 6 1 1 
# 7: a 7 1 0 
# 8: a 8 1 0 
# 9: a 9 1 0 
#10: a 10 1 0 
#11: a 11 1 0 
#12: a 12 1 0 
#13: b 1 0 0 
#14: b 2 1 0 
#15: b 3 0 0 
#16: b 4 1 0 
#17: b 5 2 0 
#18: b 6 2 0 
#19: b 7 4 0 
#20: b 8 5 0 
#21: b 9 5 0 
#22: b 10 5 0 
#23: b 11 5 0 
#24: b 12 5 0 
#25: c 1 1 0 
#26: c 2 0 0 
#27: c 3 0 0 
#28: c 4 0 0 
#29: c 5 1 0 
#30: c 6 1 0 
#31: c 7 1 0 
#32: c 8 1 0 
#33: c 9 1 0 
#34: c 10 1 0 
#35: c 11 1 0 
#36: c 12 1 0 

或者稍微更緊湊的選項

fullDat[, jump := if(all(cumsum(diff(x)) %in% c(0,1))) c(0, diff(x)) else 0 ,id_d] 
+0

Fyi,這會在DT = data.table(id_d = rep(「d」,4),year = 1:4,x = c(0,1, 0,1))'(實際上對於長「與緊湊」版本給出了不同的結果) – Frank

+0

@Frank我沒有測試所有的情況 – akrun

3

變量已經0,直到在所有年份中某一年和1以下

# find rows to assign one 
wDT = fullDat[, .(year = year[with(rle(x), 
    if (identical(values, c(0, 1))) first(lengths) + 1L 
    else 0L 
)]), by=id_d] 

# initialize to zero 
fullDat[, jump := 0L ] 
# update join to assign ones 
fullDat[wDT, on=.(id_d, year), jump := 1L ] 

沒有必要使中間表wDT;將完整的代碼寫入最終聲明也可以。事實上,這都可能是在一個行,如果想,像...

DT[, x := 0L][code_for_wDT, on=on_cols, x := 1L] 

或者,而不是加入,只是用行號從.I

# find rows to assign one 
w = fullDat[, with(rle(x), .I[ 
    if (identical(values, c(0, 1))) first(lengths) + 1L 
    else 0L 
]), by=id_d]$V1 

# initialize to zero 
fullDat[, jump := 0L ] 
# update to assign ones 
fullDat[w, jump := 1L ] 
0
fullDat[, jump := (cumsum(x==0)==(1:.N - 1L)) & (rev(cumsum(rev(x==1))) == .N:1), id_d] 

這是如何工作:

  1. cumsum(x==0) == (1:.N - 1L)檢查直到幷包括該行的零的數量等於前一行的數量
  2. rev(cumsum(rev(x==1))) == .N:1檢查從最後一行向後(從下到幷包括該行)計數的1的數量等於從這裏到末尾的行數
+0

在我看來,太聰明瞭一半。例如,它破壞了'DT = data.table(id_d = rep(「d」,4),year = 1:4,x = c(0,1,0,2))''。 – Frank

+0

好點@Frank,我添加了附加邏輯測試來檢查零值和1值。現在應該趕上這些案件 – dww

+0

是的,現在我認爲它適用於所有情況。不過,我仍然覺得它不必要的模糊。 – Frank