有條件的數據幀過濾和尾隨NA觀察

我有一個data.frame組成的觀測和模型預測的數據。一個小例子，數據集可能看起來像這樣：有條件的數據幀過濾和尾隨NA觀察

myData <- data.frame(tree=c(rep("A", 20)), doy=c(seq(75, 94)), count=c(NA,NA,NA,NA,0,NA,NA,NA,NA,1,NA,NA,NA,NA,2,NA,NA,NA,NA,NA), pred=c(0,0,0,0,1,1,1,2,2,2,2,3,3,3,3,6,9,12,20,44))

計數時觀察結果和預測建模了一套完整的天，有效（從每5天）內插的數據，每天水平列代表。

我想有條件地過濾這個數據集，以便最終將預測截斷到與觀察值相同的範圍，實際上在計數開始和結束之間保持所有預測（即除去前和後行/ pred值當它們對應於計數列中的NA時）。在這個例子中，理想的結果是：

tree doy count pred 
5  A 79  0 1 
6  A 80 NA 1 
7  A 81 NA 1 
8  A 82 NA 2 
9  A 83 NA 2 
10 A 84  1 2 
11 A 85 NA 2 
12 A 86 NA 3 
13 A 87 NA 3 
14 A 88 NA 3 
15 A 89  2 3

我試圖通過與first和last結合filter，考慮使用條件mutate創建確定是否有在觀察一列來解決這個問題先前的doy（可能使用lag）並用1或0填充，然後使用該輸出進行過濾，或者甚至創建第二個data.frame，其中包含可以連接到此數據的適當doy範圍。

在我的StackOverflow上搜索我所遇到的，似乎接近下面的問題，但我需要並不十分什麼：

Select first observed data and utilize mutate

Conditional filtering based on the level of a factor R

我的實際數據集與多個大得多多年樹木（每棵樹/每年有不同的觀察期，取決於地點的高度等）。我目前在我的代碼中實現了dplyr包，所以在該框架中的答案會很好，但對於任何解決方案都會很滿意。

來源

2015-06-23 GK_28

如您在評論中提到的那樣更新了data.table選項 – akrun

我覺得你只是希望限制行的第一和最後一個非NA計數值之間下跌：

myData[seq(min(which(!is.na(myData$count))), max(which(!is.na(myData$count)))),] 
# tree doy count pred 
# 5  A 79  0 1 
# 6  A 80 NA 1 
# 7  A 81 NA 1 
# 8  A 82 NA 2 
# 9  A 83 NA 2 
# 10 A 84  1 2 
# 11 A 85 NA 2 
# 12 A 86 NA 3 
# 13 A 87 NA 3 
# 14 A 88 NA 3 
# 15 A 89  2 3

在dplyr語法，由tree變量分組：

library(dplyr) 
myData %>% 
    group_by(tree) %>% 
    filter(seq_along(count) >= min(which(!is.na(count))) & 
     seq_along(count) <= max(which(!is.na(count)))) 
# Source: local data frame [11 x 4] 
# Groups: tree 
# 
# tree doy count pred 
# 1  A 79  0 1 
# 2  A 80 NA 1 
# 3  A 81 NA 1 
# 4  A 82 NA 2 
# 5  A 83 NA 2 
# 6  A 84  1 2 
# 7  A 85 NA 2 
# 8  A 86 NA 3 
# 9  A 87 NA 3 
# 10 A 88 NA 3 
# 11 A 89  2 3

來源

2015-06-23 20:35:30 josliber

嘗試

indx <- which(!is.na(myData$count)) 
    myData[seq(indx[1], indx[length(indx)]),] 
    # tree doy count pred 
    #5  A 79  0 1 
    #6  A 80 NA 1 
    #7  A 81 NA 1 
    #8  A 82 NA 2 
    #9  A 83 NA 2 
    #10 A 84  1 2 
    #11 A 85 NA 2 
    #12 A 86 NA 3 
    #13 A 87 NA 3 
    #14 A 88 NA 3 
    #15 A 89  2 3

如果這是基於使用na.trim組

ind <- with(myData, ave(!is.na(count), tree, 
      FUN=function(x) cumsum(x)>0 & rev(cumsum(rev(x))>0))) 
    myData[ind,] 
# tree doy count pred 
#5  A 79  0 1 
#6  A 80 NA 1 
#7  A 81 NA 1 
#8  A 82 NA 2 
#9  A 83 NA 2 
#10 A 84  1 2 
#11 A 85 NA 2 
#12 A 86 NA 3 
#13 A 87 NA 3 
#14 A 88 NA 3 
#15 A 89  2 3

或者從zoo

library(zoo) 
do.call(rbind,by(myData, myData$tree, FUN=na.trim))

或者使用data.table

library(data.table) 
setDT(myData)[,.SD[do.call(`:`,as.list(range(.I[!is.na(count)])))] , tree] 
# tree doy count pred 
#1: A 79  0 1 
#2: A 80 NA 1 
#3: A 81 NA 1 
#4: A 82 NA 2 
#5: A 83 NA 2 
#6: A 84  1 2 
#7: A 85 NA 2 
#8: A 86 NA 3 
#9: A 87 NA 3 
#10: A 88 NA 3 
#11: A 89  2 3

來源

2015-06-23 20:35:15 akrun

有條件的數據幀過濾和尾隨NA觀察

回答

相關問題