2016-12-01 25 views
1

意見認爲我們有,如何複製基於重量

library(data.table) 
dt <- data.table(id = 1:4, x1 = 10:13, x2=21:24, wt=c(1,0,0.5,0.7)) 

回報,

id x1 x2 wt 
1: 1 10 21 1.0 
2: 2 11 22 0.0 
3: 3 12 23 0.5 
4: 4 13 24 0.7 

我想複製在下列條件下觀察:

  1. 如果wt是0或1,我們分配flag分別等於1和0,
  2. 如果0 < wt < 1,我們分配flag等於0。此外,我們複製此觀察wt = 1-wt和分配flag等於1

,我期待的回報將是

id x1 x2 wt flag 
1: 1 10 21 1.0 0 
2: 2 11 22 0.0 1 
3: 3 12 23 0.5 0 
4: 3 12 23 0.5 1 
5: 4 13 24 0.7 0 
6: 4 13 24 0.3 1 

我試圖與我的代碼

dt[,flag:=ifelse(wt==1,0, ifelse(wt==0, 1, 0))] 
dt[,freq:=ifelse(wt > 0 & wt < 1, 2, 1)] 
dtr <- dt[rep(1:.N, freq)][,Indx:=1:.N, by = id] 
dtr[freq==2&Indx==2, wt:=1-wt] 
dtr[Indx==2,flag:=1] 
dtr[,`:=`(freq=NULL, Indx=NULL)] 

不過,我想是不是高效。

你有什麼建議嗎?

+0

在'data.table'中以行方式工作將會效率低下。我建議一個'base' R解決方案。 –

回答

1

我們可以改變一些步驟,使之更加緊湊,即去除ifelse,並直接使用賦值將邏輯轉換爲二進制,在不創建列的情況下複製行,然後獲取索引('i1')來分配'flag'和'wt'中的值。

dt1 <- dt[, flag := +(wt == 0)][rep(1:.N, (wt > 0 & wt < 1) +1)][] 
i1 <- dt1[, .I[seq_len(.N)==2], id]$V1 
dt1[i1, c('flag', 'wt') := .(1, 1-wt)][] 
# id x1 x2 wt flag 
#1: 1 10 21 1.0 0 
#2: 2 11 22 0.0 1 
#3: 3 12 23 0.5 0 
#4: 3 12 23 0.5 1 
#5: 4 13 24 0.7 0 
#6: 4 13 24 0.3 1 
2

下面是使用數據幀的方式:

dt <- data.frame(id = 1:4, x1 = 10:13, x2=21:24, wt=c(1,0,0.5,0.7)) 

# create the flag column 
dt$flag = 1 - ceiling(dt$wt) 

#create a new data frame with the rows that fulfill condition 2 
dt2 = dt[dt$wt < 1 && dt$wt > 0, ] 
dt2$wt = 1 - dt2$wt 
dt2$flag = 1 

#rbind it to the original data frame and reorder by id 
dt = rbind(dt,dt2) 
dt = dt[order(dt$id),] 

結果:

id x1 x2 wt flag 
1 1 10 21 1.0 0 
2 2 11 22 0.0 1 
3 3 12 23 0.5 0 
31 3 12 23 0.5 1 
4 4 13 24 0.7 0 
41 4 13 24 0.3 1 
0

tidyverse方式:

dt2 <- dt %>% 
    mutate(flag = if_else(wt == 0, 1, 0, missing = NULL)) %>% 
    mutate(flag = if_else(wt == 1, 0, flag, missing = NULL)) %>% 
    mutate(flag2 = if_else(wt %in% c(1,0), 1, 2, missing = NULL)) %>% 
    slice(rep(1:n(), flag2)) %>% 
    group_by(id) %>% 
    mutate(wt = if_else(row_number() == 1, 1-wt, wt, missing = NULL)) %>% 
    mutate(flag = if_else(row_number() == 1, 1, flag, missing = NULL)) %>% 
    select(id, x1, x2, wt, flag) 

這給

#Source: local data frame [6 x 5] 
#Groups: id [4] 
# 
#  id x1 x2 wt flag 
# <int> <int> <int> <dbl> <dbl> 
#1  1 10 21 0.0  1 
#2  2 11 22 1.0  1 
#3  3 12 23 0.5  1 
#4  3 12 23 0.5  0 
#5  4 13 24 0.3  1 
#6  4 13 24 0.7  0 

附:如果我們改變組中的第一行或最後一行,我不認爲這很重要,所以我使用了row_number() == 1