2016-11-30 54 views
1

我有一個數據幀,Phys。它包含的時間的列和其他變量的兩列像這樣: Data frame "Phys"R:有效地檢查數據幀中的相鄰元素

在某個時間點的兩個變量達到一定的閾值(例如etatg> 0.5和ETCO2> 2.5)。我需要報告這些值至少在以下9個元素(90秒)內都高於這些閾值的初始時間。我正在尋找最有效的方法來「測試」以下9個要素,以確定它們是否符合標準。

目前,我有以下代碼:

#Find all instances of relevant heuristic 
    tempalgEval = which(Phys$etagt > 0.5 & Phys$etco2>2.5) 
    #Reduce tempalgEval by length 9 to avoid index error when searching data frame 
    tempalgEval = head(tempalgEval, length(tempalgEval)-9) 

    if (length(tempalgEval) < 9) { 
    algEval = tempalgEval 
    } else{ 
    for (m in tempalgEval) { 
     if ((
     Phys$etagt[m + 1] > 0.5 & 
     Phys$etagt[m + 2] > 0.5 & 
     Phys$etagt[m + 3] > 0.5 & 
     Phys$etagt[m + 4] > 0.5 & 
     Phys$etagt[m + 5] > 0.5 & 
     Phys$etagt[m + 6] > 0.5 & 
     Phys$etagt[m + 7] > 0.5 & 
     Phys$etagt[m + 8] > 0.5 & 
     Phys$etagt[m + 9] > 0.5 
    ) | 
     (
     Phys$etco2[m + 1] > 2.5 & 
     Phys$etco2[m + 2] > 2.5 & 
     Phys$etco2[m + 3] > 2.5 & 
     Phys$etco2[m + 4] > 2.5 & 
     Phys$etco2[m + 5] > 2.5 & 
     Phys$etco2[m + 6] > 2.5 & 
     Phys$etco2[m + 7] > 2.5 & 
     Phys$etco2[m + 8] > 2.5 & Phys$etco2[m + 9] > 2.5 
    )) { 
     algEval = tempalgEval 
     } 
    } 
    } 
    if(length(algEval) > 0){ 
    algTime = min(Phys$time[algEval], na.rm=T) 
    }else{ 
    algTime = NA 
    } 

預先感謝您。

編輯:最小的工作數據集

structure(
    list(
    time = c(
     1070, 
     1080, 
     1090, 
     1100, 
     1110, 
     1120, 
     1130, 
     1160, 
     1170, 
     1180, 
     1190, 
     1200, 
     1210, 
     1220, 
     1230, 
     1240, 
     1250, 
     1260, 
     1270, 
     1280, 
     1290, 
     1300, 
     1310, 
     1320, 
     1330, 
     1340, 
     1350, 
     1360, 
     1370, 
     1380, 
     1390 
    ), 
    etagt = c(
     0, 
     0, 
     0, 
     0, 
     0, 
     0, 
     0, 
     2.92, 
     2.33379310344828, 
     1.74758620689655, 
     1.21689655172414, 
     1.18586206896552, 
     1.1548275862069, 
     1.11965517241379, 
     1.06793103448276, 
     1.01620689655172, 
     0.997586206896552, 
     1.05620689655172, 
     1.1148275862069, 
     1.16241379310345, 
     1.19344827586207, 
     1.22448275862069, 
     1.23655172413793, 
     1.22965517241379, 
     1.22275862068966, 
     1.74965517241379, 
     2.63241379310345, 
     3.5151724137931, 
     3.59655172413793, 
     3.33448275862069, 
     3.07241379310345 
    ), 
    etco2 = c(
     0, 
     0.871379310344828, 
     2.11620689655172, 
     3.36103448275862, 
     2.61413793103448, 
     1.36931034482759, 
     0.124482758620689, 
     0, 
     1.5448275862069, 
     3.08965517241379, 
     4.49379310344828, 
     4.63172413793103, 
     4.76965517241379, 
     4.92620689655172, 
     5.15724137931034, 
     5.38827586206897, 
     5.53551724137931, 
     5.48724137931034, 
     5.43896551724138, 
     5.37551724137931, 
     5.28931034482759, 
     5.20310344827586, 
     5.16, 
     5.16, 
     5.16, 
     4.15034482758621, 
     2.46758620689655, 
     0.784827586206896, 
     1.56896551724138, 
     3.41034482758621, 
     5.25172413793103 
    ) 
), 
    .Names = c("time", 
      "etagt", "etco2"), 
    row.names = c(
    108L, 
    109L, 
    110L, 
    111L, 
    112L, 
    113L, 
    114L, 
    117L, 
    118L, 
    119L, 
    120L, 
    121L, 
    122L, 
    123L, 
    124L, 
    125L, 
    126L, 
    127L, 
    128L, 
    129L, 
    130L, 
    131L, 
    132L, 
    133L, 
    134L, 
    135L, 
    136L, 
    137L, 138L, 139L, 140L), class = "data.frame") 
+0

Hi @ Floo0我已經添加了數據集。由於某種原因,我無法進入代碼塊,對此我表示歉意。希望這樣就夠了。謝謝 –

回答

0

可以按如下方式做到這一點:

require(data.table) 
setDT(dat) 
# tr := both Threshold Reached 
dat[, tr:=etagt>0.5 & etco2 > 2.5] 
# Get grouping variable - in case have a look at ?rleid 
dat[, run := rleid(tr)] 
# Get indices where run was long enough 
# 10 means the first one and the 9 following were > threshold 
ind <- dat[,.N, run][N>=10] # For >=9 you would get 2 matches 
# Get the first timeing per run 
dat[ind, on="run", mult="first"] 

它給你:

time etagt etco2 tr run N 
1: 1180 1.747586 3.089655 TRUE 2 17 

要看看怎麼回事看看datdat[,.N, run]ind

+0

謝謝!當我在我的主數據集上運行它時,我得到一個有許多行的輸出。在主數據集中,第一行是FALSE,第二行是TRUE。你知道我怎樣才能識別表格的第一個TRUE實例嗎? –

+0

不知道這是最好的方法,但看看'cummax',然後通過只採取'cummax == TRUE'子集# – Rentrop

+0

非常感謝@ Floo0。作爲後續行動,是否會有一種方法來定義多個不同的閾值並找出每個閾值的時間?我有六個不同的閾值來測試,目前我可以認爲只重複上面的代碼(使用不同的閾值)6次。 –