2017-01-04 10 views
1

讓說,我們有以下如何按組更新特定行的常量值?

library(data.table); library(zoo) 
dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25)) 
dt[, period:=as.yearmon(period, '%Y-%m-%d')] 

回報,

grp period x y 
1: 1 Jan 2014 1 11 
2: 1 Feb 2014 2 12 
3: 1 Mar 2014 3 13 
4: 1 Apr 2014 4 14 
5: 1 May 2014 5 15 
6: 2 Jan 2014 6 16 
7: 2 Feb 2014 7 17 
8: 2 Mar 2014 8 18 
9: 2 Apr 2014 9 19 
10: 2 May 2014 10 20 
11: 3 Jan 2014 11 21 
12: 3 Feb 2014 12 22 
13: 3 Mar 2014 13 23 
14: 3 Apr 2014 14 24 
15: 3 May 2014 15 25 

我想更新使用與March 2014值列xy。我希望將如下返回:

grp period x y 
1: 1 Jan 2014 1 11 
2: 1 Feb 2014 2 12 
3: 1 Mar 2014 3 13 
4: 1 Apr 2014 3 13 
5: 1 May 2014 3 13 
6: 2 Jan 2014 6 16 
7: 2 Feb 2014 7 17 
8: 2 Mar 2014 8 18 
9: 2 Apr 2014 8 18 
10: 2 May 2014 8 18 
11: 3 Jan 2014 11 21 
12: 3 Feb 2014 12 22 
13: 3 Mar 2014 13 23 
14: 3 Apr 2014 13 23 
15: 3 May 2014 13 23 

我曾嘗試下面的代碼,但它只能從row 3中的值。

dt[which(period > dt[3, period]),`:=`(x=dt[3, x], y = dt[3, y]), by=grp] 

能否請您給點建議?

+0

http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value – akrun

+1

的可能的複製也許還通過= grp] $ V1,\':= \'(x = x [1],y = y [1]),by = grp]'(如果'dt'被排序) 或 'dt [period> =「Mar 2014」,\':= \'(x = x [1],y = y [1]),by = grp]'? – lukeA

回答

4

你可以March 2014NA替換所有xy值,然後使用na.locf()

dt[period > "March 2014",`:=`(x=NA,y=NA)][,`:=`(x=na.locf(x), y=na.locf(y))] 
# grp period x y 
# 1: 1 Jan 2014 1 11 
# 2: 1 Feb 2014 2 12 
# 3: 1 Mar 2014 3 13 
# 4: 1 Apr 2014 3 13 
# 5: 1 May 2014 3 13 
# 6: 2 Jan 2014 6 16 
# 7: 2 Feb 2014 7 17 
# 8: 2 Mar 2014 8 18 
# 9: 2 Apr 2014 8 18 
#10: 2 May 2014 8 18 
#11: 3 Jan 2014 11 21 
#12: 3 Feb 2014 12 22 
#13: 3 Mar 2014 13 23 
#14: 3 Apr 2014 13 23 
#15: 3 May 2014 13 23 
1

一個與dplyr選項。爲大於等於Mar 2014period和分配用於週期Mar 2014grp分組的所有行的xy值濾波的數據。

library(dplyr) 
dt[dt$period >= "Mar 2014"] <- dt %>% 
           filter(period >= "Mar 2014") %>% 
           group_by(grp) %>% 
           mutate(x = x[period == "Mar 2014"], 
             y = y[period == "Mar 2014"]) 

dt 
# grp period x y 
#1: 1 Jan 2014 1 11 
#2: 1 Feb 2014 2 12 
#3: 1 Mar 2014 3 13 
#4: 1 Apr 2014 3 13 
#5: 1 May 2014 3 13 
#6: 2 Jan 2014 6 16 
#7: 2 Feb 2014 7 17 
#8: 2 Mar 2014 8 18 
#9: 2 Apr 2014 8 18 
#10: 2 May 2014 8 18 
#11: 3 Jan 2014 11 21 
#12: 3 Feb 2014 12 22 
#13: 3 Mar 2014 13 23 
#14: 3 Apr 2014 13 23 
#15: 3 May 2014 13 23 
3

在此再次來看,我認爲這是(假設排序)一個非常乾淨的方式:

cols = c("x", "y") 
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols] 

另一種方法是使用滾動加盟:

dt[period >= "Mar 2014", c("x", "y") := 
    .SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)] 
] 

第二個選項如何工作

以下所有內容均在主文檔中介紹,可通過鍵入?data.table來獲取。

DT[i, (cols) := e]將在由i選擇的行中覆蓋cols

更仔細地看e,我們看到.SD,這隻能內部DT[i, ...]。我們可以把它拿出來DT[i, ...]是我們用DT[i]代替.SD。從這裏,我們可以簡化e來看看它是如何工作的:

> mySD = DT[period >= "Mar 2014"] 
> mySD 
    grp period x y 
1: 1 Mar 2014 3 13 
2: 1 Apr 2014 4 14 
3: 1 May 2014 5 15 
4: 2 Mar 2014 8 18 
5: 2 Apr 2014 9 19 
6: 2 May 2014 10 20 
7: 3 Mar 2014 13 23 
8: 3 Apr 2014 14 24 
9: 3 May 2014 15 25 
> mySD[period == "Mar 2014"] 
    grp period x y 
1: 1 Mar 2014 3 13 
2: 2 Mar 2014 8 18 
3: 3 Mar 2014 13 23 
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)] 
    grp period x y i.x i.y 
1: 1 Mar 2014 3 13 3 13 
2: 1 Apr 2014 NA NA 4 14 
3: 1 May 2014 NA NA 5 15 
4: 2 Mar 2014 8 18 8 18 
5: 2 Apr 2014 NA NA 9 19 
6: 2 May 2014 NA NA 10 20 
7: 3 Mar 2014 13 23 13 23 
8: 3 Apr 2014 NA NA 14 24 
9: 3 May 2014 NA NA 15 25 
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE] 
    grp period x y i.x i.y 
1: 1 Mar 2014 3 13 3 13 
2: 1 Apr 2014 3 13 4 14 
3: 1 May 2014 3 13 5 15 
4: 2 Mar 2014 8 18 8 18 
5: 2 Apr 2014 8 18 9 19 
6: 2 May 2014 8 18 10 20 
7: 3 Mar 2014 13 23 13 23 
8: 3 Apr 2014 13 23 14 24 
9: 3 May 2014 13 23 15 25 
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)] 
    x.x x.y 
1: 3 13 
2: 3 13 
3: 3 13 
4: 8 18 
5: 8 18 
6: 8 18 
7: 13 23 
8: 13 23 
9: 13 23