索引上有R

條件我有類似下面的數據索引上有R

 Time output 
     2  1 
     2  1 
     2  2 
     2  2 
     2  1 
     2  2 
     2  1

我需要在這個數據添加兩列一些數據：

index：每當output==2應該計數和計數遺體直到它遇到1現在如果遇到另一個2它應該增加它的計數。
total time：應該總結的時候output==2 1之間

預期輸出：

 Time output index total_time 
     2  1  0   0 
     2  1  0   0 
     2  2  1   4 
     2  2  1   4 
     2  1  0   0 
     2  2  2   2 
     2  1  0   0

預先感謝您。

來源

2016-07-21 amy

這裏是基於rle和cumsum的解決方案。我正在添加評論來解釋主要步驟，即使很難用文字來解釋它。該解決方案是矢量化的，沒有任何循環。

## init the vectors results with zeros 
dx$index <- rep(0,nrow(dx)) 
dx$total_time <- rep(0,nrow(dx)) 
## use rle to get the position/length 
rr <- rle(dx$output) 
## only the val 2 is important for us , so we store into index 
ii <- rr$values==2 
## we replace the occuronce of 2 in the original vector by the cumulative 
## repeating it : hard to explain !! 
vals <- cumsum(ii)[ii] 
occurs <- rr$len[ii] 
dx$index[dx$output==2] <- rep(vals,occurs) 
## same thing for the total just we change the value here 
dx$total_time[dx$output==2] <- rep(occurs*2,occurs) 

#  Time output index  total_time 
# 1 2  1  0   0 
# 2 2  1  0   0 
# 3 2  2  1   4 
# 4 2  2  1   4 
# 5 2  1  0   0 
# 6 2  2  2   2 
# 7 2  1  0   0

，其中作爲DX閱讀：

dx <- read.table(text=" Time output 
     2  1 
      2  1 
      2  2 
      2  2 
      2  1 
      2  2 
      2  1",header=T)

來源

2016-07-21 01:36:28 agstudy

謝謝你這個作品！ – amy

使用一些索引和餡約：

dat[c("index","total_time")] <- 0 
hit <- dat$output==2 
dat$index[hit] <- c(factor(cumsum(!hit)[hit])) 
dat$total_time[hit] <- with(dat[hit,], ave(output, index, FUN=sum)) 

# Time output index total_time 
#1 2  1  0   0 
#2 2  1  0   0 
#3 2  2  1   4 
#4 2  2  1   4 
#5 2  1  0   0 
#6 2  2  2   2 
#7 2  1  0   0

來源

2016-07-21 01:43:53 thelatemail

謝謝你這個作品！ – amy

這比公認的解決方案簡單得多。 – agstudy

@agstudy - 你的確明顯快得多（例如1M +行） – thelatemail

下面是使用data.table一個選項。將'data.frame'轉換爲'data.table'（setDT(df1)），在邏輯向量（output == 2）上使用rleid創建索引，當'index'不爲0時，將'index'分配爲元素之間的match在「索引」和unique價值觀，創建了「TOTAL_TIME」，由「索引」組合，其中「指標」是不爲0，如果需要的話NA元素可以被轉換爲0。

library(data.table) 
setDT(df1)[, index:= rleid(output ==2)*(output ==2) 
      ][index!=0, index := match(index, unique(index)) 
      ][index!=0, total_time :=sum(Time) , index 
      ][is.na(total_time), total_time := 0] 
df1 
# Time output index total_time 
#1: 2  1  0   0 
#2: 2  1  0   0 
#3: 2  2  1   4 
#4: 2  2  1   4 
#5: 2  1  0   0 
#6: 2  2  2   2 
#7: 2  1  0   0

來源

2016-07-21 02:56:58 akrun

回答

相關問題