2014-01-24 67 views
5

比方說,我有一個響應變量隨着時間的推移而升降。每當響應變量超過閾值時,我們就會有一個新的「試驗」。也就是說,如果我添加一列ThresholdTRUE,每當高於某個值時,其中ThresholdTRUE的連續數據點塊構成新的試驗。根據閾值分組數據?

Time <- seq(1, 10, by = 0.5) 
Response <- abs(sin(Time)) 
Threshold <- Response > 0.6 
data <- data.frame(Time, Response, Threshold) 

鑑於TimeResponseThreshold,我怎麼可能去補充說,對每個組的TRUE閾值的新值Trial因素?事情是這樣的:

Time Response Threshold Trial 
1 1.0 0.84147098  TRUE A 
2 1.5 0.99749499  TRUE A 
3 2.0 0.90929743  TRUE A 
4 2.5 0.59847214  FALSE NA 
5 3.0 0.14112001  FALSE NA 
6 3.5 0.35078323  FALSE NA 
7 4.0 0.75680250  TRUE B 
8 4.5 0.97753012  TRUE B 
9 5.0 0.95892427  TRUE B 
10 5.5 0.70554033  TRUE B 
11 6.0 0.27941550  FALSE NA 
12 6.5 0.21511999  FALSE NA 
13 7.0 0.65698660  TRUE C 
14 7.5 0.93799998  TRUE C 
15 8.0 0.98935825  TRUE C 
16 8.5 0.79848711  TRUE C 
17 9.0 0.41211849  FALSE NA 
18 9.5 0.07515112  FALSE NA 
19 10.0 0.54402111  FALSE NA 

回答

3
data$Trial <- factor(
    ifelse(data$Threshold, cumsum(!data$Threshold), NA), labels = c("A", "B", "C") 
) 

## Time Response Threshold Trial 
## 1 1.0 0.84147098  TRUE  A 
## 2 1.5 0.99749499  TRUE  A 
## 3 2.0 0.90929743  TRUE  A 
## 4 2.5 0.59847214  FALSE <NA> 
## 5 3.0 0.14112001  FALSE <NA> 
## 6 3.5 0.35078323  FALSE <NA> 
## 7 4.0 0.75680250  TRUE  B 
## 8 4.5 0.97753012  TRUE  B 
## 9 5.0 0.95892427  TRUE  B 
## 10 5.5 0.70554033  TRUE  B 
## 11 6.0 0.27941550  FALSE <NA> 
## 12 6.5 0.21511999  FALSE <NA> 
## 13 7.0 0.65698660  TRUE  C 
## 14 7.5 0.93799998  TRUE  C 
## 15 8.0 0.98935825  TRUE  C 
## 16 8.5 0.79848711  TRUE  C 
## 17 9.0 0.41211849  FALSE <NA> 
## 18 9.5 0.07515112  FALSE <NA> 
## 19 10.0 0.54402111  FALSE <NA> 
2

另一種可能使用rle

r <- with(data, rle(Threshold)) 
len <- with(r, lengths[values]) 
n <- length(len) 

trial <- rep(x = LETTERS[1:n], times = len) 

data$Trial[data$Threshold] <- trial 

data 
+0

+1。這比傑克的答案要快,特別是在數據變大的時候。它可以進一步優化。看到這裏:https://gist.github.com/mrdwab/8601445 – A5C1D2H2I1M1N2O1R2T1

+0

@AnandaMahto,謝謝你的意見和改進建議! – Henrik