2015-11-19 31 views
1

我有一個0/1虛擬變量的數據幀。每個虛擬變量只取值1一次。對於每一列,我想用n值替換前n /後n個觀察值,並將其值設爲1(例如1)。R:重新編碼之前/之後的n個觀察值

所以對於單個矢量,其中n = 1:

c(0, 0, 1, 0, 0) 

我會想

c(0, 1, 1, 1, 0) 

什麼是與n列了良好的一般方法,並允許不同數量的之前/之後的觀察替換(例如之前的&之後的n-1)?

感謝您的幫助!

+0

像'as.numeric(過濾器(X,代表(1,3)的,圓形= TRUE))'。 –

回答

1

另一種選擇:

f <- function(x, pre, post) { 
    idx <- which.max(x) 
    x[max(1, (idx-pre)):min(length(x), (idx+post))] <- 1 
    x 
} 

的樣本數據:

df <- data.frame(x = c(0, 0, 1, 0, 0), y = c(0, 1, 0, 0, 0)) 

鴨折襞:

df[] <- lapply(df, f, pre=2, post=1) 
#df 
# x y 
#1 1 1 
#2 1 1 
#3 1 1 
#4 1 0 
#5 0 0 
1

你可以做的是:

vec <- c(0, 0, 1, 0, 0) 

sapply(1:length(vec), function(i) { 
    minval <- max(0, i - 1) 
    maxval <- min(i + 1, length(vec)) 
    return(sum(vec[minval:maxval])) 
}) 
# [1] 0 1 1 1 0 

或者把它放在一個函數(相同的代碼,但有點更緊湊)

f <- function(vec){ 
    sapply(1:length(vec), function(i) 
       sum(vec[max(0, i-1):min(i+1, length(vec))])) 
} 

f(vec) 
# [1] 0 1 1 1 0 

SPEEDTEST

爲了比較兩個不同的解決方案,我很快用microbenchmark進行了一個基準測試,獲勝者是:很清楚@盛林的代碼....總是很高興看到簡單的解決方案(以及看看有多複雜(m y)解決方案可以)。

fDavid <- function(vec){ 
    sapply(1:length(vec), function(i) 
    sum(vec[max(0, i-1):min(i+1, length(vec))])) 
} 
fHeroka <- function(vec){ 
    res <- vec 
    test <- which(vec==1) 

    #create indices to be replaced 

    n=1 #variable n 
    replace_indices <- c(test+(1:n),test-(1:n)) 
    #filter out negatives (may happen with larger n) 
    replace_indices <- replace_indices[replace_indices>0] 
    #replace items in 'res' that need to be replaced with 1 

    res[replace_indices] <- 1 
} 
fShenglin <- function(vec){ 

    ind<-which(vec==1) 
    vec[(ind-1):(ind+x)]<-1 
} 

vect <- sample(0:1, size = 1000, replace = T) 

library(microbenchmark) 
microbenchmark(fHeroka(vect), fDavid(vect), fShenglin) 
# # Unit: nanoseconds 
# expr  min  lq  mean median  uq  max 
# fHeroka(vect) 38929 42999 54422.57 49546 61755.5 145451 
# fDavid(vect) 2463805 2577935 2875024.99 2696844 2849548.5 5994596 
# fShenglin  0  0  138.63  1  355.0 1063 
# neval cld 
# 100 a 
# 100 b 
# 100 a 
# Warning message: 
# In microbenchmark(fHeroka(vect), fDavid(vect), fShenglin) : 
# Could not measure a positive execution time for 30 evaluations. 
0

這可能是一個開始:

myv <- c(0, 0, 1, 0, 0) 

#make a copy 
res <- myv 

#check where the ones are 
test <- which(myv==1) 

#create indices to be replaced 

n=1 #variable n 
replace_indices <- c(test+(1:n),test-(1:n)) 
#filter out negatives (may happen with larger n) 
replace_indices <- replace_indices[replace_indices>0] 
#replace items in 'res' that need to be replaced with 1 

res[replace_indices] <- 1 
res 

    > res 
    [1] 0 1 1 1 0 
3
x<-c(0,0,1,0,0) 
ind<-which(x==1) 
x[(ind-1):(ind+x)]<-1 
+0

非常簡單快捷的解決方案,但是,您缺少一個檢查,例如:在向量'x < - c(1,0,1,0,0,1)'上運行代碼,您需要檢查' ind'在0以上並且在'length(x)'以下' – David

+0

你可以用這行來做:'x [(max(0,ind-1)):min((ind + x),length(x))] <-1' – David

0

這可能是一個解決方案:

dat<-data.frame(x=c(0,0,1,0,0,0),y=c(0,0,0,1,0,0),z=c(0,1,0,0,0,0)) 
which_to_change<-data.frame(prev=c(2,2,1),foll=c(1,1,3)) 
for(i in 1:nrow(which_to_change)){ 
    dat[(which(dat[,i]==1)-which_to_change[i,1]):(which(dat[,i]==1)+which_to_change[i,2]),i]<-1 
} 
相關問題