5
隨意編輯這個稱號,使之更容易理解/普及...data.table:馬克前/組內的符號發生後
我有3列,其形成data.table對象組(id
,id2
pol_loc
)。在這些組內是行觀察,每個組或NA
的某一行將會有一個星號。我想爲每行相對於星號(之前 - 1,之後 - 0)有效地製作指標列。下面是數據表的樣子:
id id2 pol_loc non_pol cluster_tag
1: 1 1 3 do NA
2: 1 1 3 you NA
3: 1 1 3 * NA
4: 1 1 3 it NA
-------------------------------------
5: 1 2 3 but 4
6: 1 2 3 i NA
7: 1 2 3 * NA
8: 1 2 3 really 2
9: 1 2 3 bad NA
-------------------------------------
10: 1 2 5 but 4
11: 1 2 5 i NA
12: 1 2 5 hate NA
13: 1 2 5 really 2
14: 1 2 5 * NA
15: 1 2 5 dogs NA
-------------------------------------
16: 2 1 4 i NA
17: 2 1 4 am NA
18: 2 1 4 the NA
19: 2 1 4 * NA
20: 2 1 4 friend NA
-------------------------------------
21: 3 1 4 do NA
22: 3 1 4 you NA
23: 3 1 4 really 2
24: 3 1 4 * NA
-------------------------------------
25: 3 2 NA NA NA
id id2 pol_loc non_pol cluster_tag
所需的輸出:
下面是所需的輸出:
id id2 pol_loc non_pol cluster_tag before
1: 1 1 3 do NA 1
2: 1 1 3 you NA 1
3: 1 1 3 * NA NA
4: 1 1 3 it NA 0
----------------------------------------------
5: 1 2 3 but 4 1
6: 1 2 3 i NA 1
7: 1 2 3 * NA NA
8: 1 2 3 really 2 0
9: 1 2 3 bad NA 0
----------------------------------------------
10: 1 2 5 but 4 1
11: 1 2 5 i NA 1
12: 1 2 5 hate NA 1
13: 1 2 5 really 2 1
14: 1 2 5 * NA NA
15: 1 2 5 dogs NA 0
----------------------------------------------
16: 2 1 4 i NA 1
17: 2 1 4 am NA 1
18: 2 1 4 the NA 1
19: 2 1 4 * NA NA
20: 2 1 4 friend NA 0
----------------------------------------------
21: 3 1 4 do NA 1
22: 3 1 4 you NA 1
23: 3 1 4 really 2 1
24: 3 1 4 * NA NA
----------------------------------------------
25: 3 2 NA NA NA NA
id id2 pol_loc non_pol cluster_tag before
MWE
dat <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L),
id2 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), pol_loc = c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA), non_pol = c("do", "you",
"*", "it", "but", "i", "*", "really", "bad", "but", "i",
"hate", "really", "*", "dogs", "i", "am", "the", "*", "friend",
"do", "you", "really", "*", NA), cluster_tag = c(NA, NA,
NA, NA, "4", NA, NA, "2", NA, "4", NA, NA, "2", NA, NA, NA,
NA, NA, NA, NA, NA, NA, "2", NA, NA)), row.names = c(NA,
-25L), class = "data.frame", .Names = c("id", "id2", "pol_loc",
"non_pol", "cluster_tag"))
library(data.table)
setDT(dat)
編輯如果它變得更容易或更高效,NA
可以變成0
或1
它沒有什麼區別,我猜這樣更有效率。
這一個好得多。 – akrun
好簡單,但我不會想到走這條路。真棒。 –
'1-cumsum'對我來說看起來很奇怪,可以創造出0/1的變種。我會用'before:= +(.I <= .I [which(non_pol ==「*」)])'或'1:.N <= which(non_pol ==「*」)' – Frank