2017-04-24 42 views
1

我有一個數據框,我試圖刪除年份不連續的行。刪除行的數據是不連續的R,dplyr

這裏是我的數據幀的樣本:

  Name  Year Position Year_diff FBv  ind1 velo_diff 
1  Aaron Heilman 2005  RP   2 90.1 TRUE  0.0 
2  Aaron Heilman 2003  SP   NA 89.4  NA  0.0 
3  Aaron Laffey 2010  RP   1 86.8 TRUE  -0.6 
4  Aaron Laffey 2009  SP   NA 87.4  NA  0.0 
5  Alexi Ogando 2015  RP   2 94.5 TRUE  0.0 
6  Alexi Ogando 2013  SP   NA 93.4 FALSE  0.0 
7  Alexi Ogando 2012  RP   1 97.0 TRUE  1.9 
8  Alexi Ogando 2011  SP   NA 95.1  NA  0.0 

預期輸出應該是:

  Name  Year Position Year_diff FBv ind1 velo_diff 
3  Aaron Laffey 2010  RP   1 86.8 TRUE -0.6 
4  Aaron Laffey 2009  SP   NA 87.4  NA  0.0 
7  Alexi Ogando 2012  RP   1 97.0 TRUE  1.9 
8  Alexi Ogando 2011  SP   NA 95.1  NA  0.0 

原因亞歷克西·奧甘多2011-2012仍然存在是因爲他的SP序列連續幾年符合RP。 Ogando的2013-2015年SPRP序列連續幾年沒有得到滿足。

這可能有助於一個元素是,每個地方這些年來沒有先後順序順序,velo_diff將0.0

有人會知道如何做到這一點?所有的幫助表示讚賞。

回答

1

你可以做一個組合filter,檢查是否後續或上一年度存在並且Position匹配相應:

library(dplyr) 

df <- read.table(text = 'Name  Year Position Year_diff FBv  ind1 velo_diff 
1  "Aaron Heilman" 2005  RP   2 90.1 TRUE  0.0 
2  "Aaron Heilman" 2003  SP   NA 89.4  NA  0.0 
3  "Aaron Laffey" 2010  RP   1 86.8 TRUE  -0.6 
4  "Aaron Laffey" 2009  SP   NA 87.4  NA  0.0 
5  "Alexi Ogando" 2015  RP   2 94.5 TRUE  0.0 
6  "Alexi Ogando" 2013  SP   NA 93.4 FALSE  0.0 
7  "Alexi Ogando" 2012  RP   1 97.0 TRUE  1.9 
8  "Alexi Ogando" 2011  SP   NA 95.1  NA  0.0', header = TRUE) 

df %>% group_by(Name) %>% 
    filter(((Year - 1) %in% Year & Position == 'RP') | 
      ((Year + 1) %in% Year & Position == 'SP')) 

#> Source: local data frame [4 x 7] 
#> Groups: Name [2] 
#> 
#>   Name Year Position Year_diff FBv ind1 velo_diff 
#>   <fctr> <int> <fctr>  <int> <dbl> <lgl>  <dbl> 
#> 1 Aaron Laffey 2010  RP   1 86.8 TRUE  -0.6 
#> 2 Aaron Laffey 2009  SP  NA 87.4 NA  0.0 
#> 3 Alexi Ogando 2012  RP   1 97.0 TRUE  1.9 
#> 4 Alexi Ogando 2011  SP  NA 95.1 NA  0.0 
1

我們可以使用data.table

library(data.table) 
setDT(df1)[df1[, .I[abs(diff(Year))==1], .(Name, grp = cumsum(Position == "RP"))]$V1] 
#   Name Year Position Year_diff FBv ind1 velo_diff 
#1: Aaron Laffey 2010  RP   1 86.8 TRUE  -0.6 
#2: Aaron Laffey 2009  SP  NA 87.4 NA  0.0 
#3: Alexi Ogando 2012  RP   1 97.0 TRUE  1.9 
#4: Alexi Ogando 2011  SP  NA 95.1 NA  0.0 

或者使用與dplyr相同的方法

library(dplyr) 
df1 %>% 
    group_by(Name, grp = cumsum(Position == "RP")) %>% 
    filter(abs(diff(Year))==1) %>% #below 2 steps may not be needed 
    ungroup() %>% 
    select(-grp) 
# A tibble: 4 × 7 
#   Name Year Position Year_diff FBv ind1 velo_diff 
#   <chr> <int> <chr>  <int> <dbl> <lgl>  <dbl> 
#1 Aaron Laffey 2010  RP   1 86.8 TRUE  -0.6 
#2 Aaron Laffey 2009  SP  NA 87.4 NA  0.0 
#3 Alexi Ogando 2012  RP   1 97.0 TRUE  1.9 
#4 Alexi Ogando 2011  SP  NA 95.1 NA  0.0