我想要的.SD
功能與by =
在非球菌結合加入:如何選擇指定列中每個組的前n行(加入後)?
data.table - select first n rows within group
示例數據:
tmp_dt1<- data.table(grp = c(1,2), time = c(0.2, 0.6, 0.4, 0.8, 0.25, 0.65))
tmp_dt2 <- data.table(grp = c(1,2), time_from = c(0.1, 0.5))
tmp_dt2 <- tmp_dt2[, time_to := time_from + 0.2]
> tmp_dt1
grp time
1: 1 0.20
2: 2 0.60
3: 1 0.40
4: 2 0.80
5: 1 0.25
6: 2 0.65
> tmp_dt2
grp time_from time_to
1: 1 0.1 0.3
2: 2 0.5 0.7
現在,我需要的輸出是每個組中的第一次位於tmp_dt2
中定義的範圍之間。我可以得到所有這樣的時代:
> tmp_dt1[tmp_dt2, .(grp, time = x.time, time_from, time_to), on = .(grp, time >= time_from, time <= time_to)]
grp time time_from time_to
1: 1 0.20 0.1 0.3
2: 1 0.25 0.1 0.3
3: 2 0.60 0.5 0.7
4: 2 0.65 0.5 0.7
不過,我有一些麻煩,使用by
提取每個grp
第n行,沒有鏈接。舉個例子,當n = 1
,所需的輸出是:
tmp_dt1[tmp_dt2, .(grp, time = x.time, time_from, time_to),
on = .(grp, time >= time_from, time <= time_to)][, .SD[1], by = grp]
grp time time_from time_to
1: 1 0.2 0.1 0.3
2: 2 0.6 0.5 0.7
但是,這樣的:
> tmp_dt1[tmp_dt2, .(time = x.time[1], time_from[1], time_to[1]), on = .(grp, time >= time_from, time <= time_to), by = grp]
Error in `[.data.table`(tmp_dt1, tmp_dt2, .(time = x.time[1], time_from[1], :
object 'time_from' not found
不起作用。
使用,.SD
接近,但給我結果的混亂結束在選擇的列的條款:
tmp_dt1[tmp_dt2, .SD[1], on = .(grp, time >= time_from, time <= time_to), by = grp]
grp time
1: 1 0.2
2: 2 0.6
爲什麼我不想做一個鏈的原因是因爲memory issues。請注意,我只對data.table
軟件包解決這個問題感興趣。
謝謝你的回答,也是非常有用的鏈接,解釋'x.'符號 – Alex