2017-10-18 347 views
1

對於給定的數據表,請參閱下面的示例,我只想通過Unique_ID保持Difference列的值大於2,而不刪除NA行。R中的子集滯後值

My_data_table <- structure(list(Unique_ID = structure(c(1L, 1L, 2L, 2L, 3L, 
        3L, 3L, 4L, 4L, 4L), .Label = c("1AA", "3AA", "5AA", "6AA"), 
        class = "factor"), Distance.km. = c(1, 2.05, 2, 4, 2, 4, 7, 
        8, 9, 10), Difference = c(NA, 1.05, NA, 2, NA, 2, 3, NA, 1, 1)), 
        .Names = c("Unique_ID", "Distance.km.", "Difference"), 
        class = "data.frame", row.names = c(NA, -10L)) 
My_data_table 
Unique_ID Distance(km) Difference  
1AA  1    NA   
1AA  2.05   1.05   
3AA  2    NA   
3AA  4    2   
5AA  2    NA   
5AA  4    2   
5AA  7    3 
6AA  8    NA 
6AA  9    1 
6AA  10   1 

這裏是我在找

 My_data_table 
Unique_ID Distance(km) Difference    
3AA  2    NA   
3AA  4    2   
5AA  2    NA   
5AA  4    2   
5AA  7    3 
+1

你嘗試過這麼遠嗎? –

+0

爲什麼在這裏輸入'5AA 2 NA' –

回答

3

轉換爲 'data.table'(setDT(df1))中,由 'UNIQUE_ID',if邏輯矢量的sum分組後的結果(Difference >= 2)大於0,則獲得Data.table的子集(.SD),其中'差'是NA|它大於或等於2

library(data.table) 
setDT(df1)[, if(sum(Difference >=2, na.rm = TRUE)>0) 
       .SD[is.na(Difference)|Difference>=2], by = Unique_ID] 
#  Unique_ID Distance.km. Difference 
#1:  3AA   2   NA 
#2:  3AA   4   2 
#3:  5AA   2   NA 
#4:  5AA   4   2 
#5:  5AA   7   3 
0

一個dplyr解決方案:

library(dplyr) 

df %>% 
    group_by(Unique_ID) %>% 
    filter(any(Difference >= 2 & !is.na(Difference))) 
# # A tibble: 5 x 3 
# # Groups: Unique_ID [2] 
# Unique_ID Distance.km. Difference 
#  <fctr>  <dbl>  <dbl> 
# 1  3AA   2   NA 
# 2  3AA   4   2 
# 3  5AA   2   NA 
# 4  5AA   4   2 
# 5  5AA   7   3