在R中過濾沒有循環的數據

我有相當大的數據幀（幾百萬條記錄）。
由於以下規則，我需要對其進行過濾：
- 對於每個產品，刪除x> 0的第一條記錄之後的第五條記錄之前的所有記錄。在R中過濾沒有循環的數據

所以，我們只關注兩列 - ID和x。數據框按ID排序。
使用循環來完成它是相當容易的，但循環在這樣的大數據框上表現不佳。

如何在'矢量風格'中做到這一點？

實施例：
之前過濾

ID x 
1 0 
1 0 
1 5 # First record with x>0 
1 0 
1 3 
1 4 
1 0 
1 9 
1 0 # Delete all earlier records of that product 
1 0 
1 6 
2 0 
2 1 # First record with x>0 
2 0 
2 4 
2 5 
2 8 
2 0 # Delete all earlier records of that product 
2 1 
2 3

過濾後：

來源

2012-07-01 Tomek Tarczynski

對於這些分割，申請，結合的問題 - 我喜歡使用plyr。如果速度成爲問題，還有其他選擇，但對於大多數情況 - plyr很容易理解和使用。我編寫了一個實現上述邏輯的函數，然後將其提供給ddply()，以基於ID對每個數據塊進行操作。

fun <- function(x, column, threshold, numplus){ 
    whichcol <- which(x[column] > threshold)[1] 
    rows <- seq(from = (whichcol + numplus), to = nrow(x)) 
    return(x[rows,]) 
}

再喂這ddply()

require(plyr) 
ddply(dat, "ID", fun, column = "x", threshold = 0, numplus = 5) 
#----- 
    ID x 
1 1 9 
2 1 0 
3 1 0 
4 1 6 
5 2 0 
6 2 1 
7 2 3

來源

2012-07-01 15:55:05 Chase

謝謝！有用。那正是我正在尋找的 - 乾淨的R風格解決方案。 –

在R中過濾沒有循環的數據

回答

相關問題