2017-07-09 26 views
2

我想要列出年份國家特定的假人列表,並且還希望在這些年份之前標記年份兩年dplyr:在標記時間段之前的幾年內過濾

的數據看起來像這樣

library(tidyverse) 

df <- tribble(
    ~year, ~country, ~occurrence, 
    #--|--|---- 
    2003, "USA", 1, 
    2004, "USA", 0, 
    2005, "USA", 0, 
    2006, "USA", 0, 
    2007, "USA", 0, 
    2008, "USA", 0, 
    2009, "USA", 0, 
    2010, "USA", 0, 
    2011, "USA", 1, 
    2012, "USA", 0, 
    2013, "USA", 0, 
    2005, "FRA", 0, 
    2006, "FRA", 0, 
    2007, "FRA", 1, 
    2008, "FRA", 1, 
    2009, "FRA", 0, 
    2010, "FRA", 0, 
    2011, "FRA", 0, 
    2012, "FRA", 0, 
    2013, "FRA", 0, 
    2014, "FRA", 0, 
    2015, "FRA", 1 
) 

所以對於"USA"我也想提出一個1occurence列2009年和2010年和FRA多年的2005年,2006年,2013年和2014年

我想過做這樣的事情:

df %>% 
    group_by(country) %>% 
    mutate(occurence = ifelse("not sure what to put here"), 
          1, 
          0)) 

但我不知道如何在讓R爲了篩選我想要的年份。

+0

您需要滿足以下條件:'(country ==「USA」&year%in%2009:2010)| (country ==「FRA」&year%in%c(2005,2006,2013,2014))' – Jaap

回答

1

這裏是另一個dplyr解決方案:

df %>% 
    group_by(country) %>% 
    mutate(
     occurrence=ifelse(lead(occurrence, 1) %in% 1 | 
          lead(occurrence, 2) %in% 1, 
          1, occurrence) 
     ) 

# A tibble: 22 x 3 
# Groups: country [2] 
    year country occurrence 
    <dbl> <chr>  <dbl> 
1 2003  USA   1 
2 2004  USA   0 
3 2005  USA   0 
4 2006  USA   0 
5 2007  USA   0 
6 2008  USA   0 
7 2009  USA   1 
8 2010  USA   1 
9 2011  USA   1 
10 2012  USA   0 
11 2013  USA   0 
12 2005  FRA   1 
13 2006  FRA   1 
14 2007  FRA   1 
15 2008  FRA   1 
16 2009  FRA   0 
17 2010  FRA   0 
18 2011  FRA   0 
19 2012  FRA   0 
20 2013  FRA   1 
21 2014  FRA   1 
22 2015  FRA   1 

lead(occurrence, 1) %in% 1代替lead(occurrence, 1) == 1因爲後者不能處理NA

+0

完美,這就是我一直在尋找的! –

2

通過「國家」的分組之後,我們可能需要長達「發生」的2 lead,並得到各行maxpmax獲得在「發生」

df %>% 
    group_by(country) %>% 
    mutate(occurrence = pmax(occurrence, lead(occurrence, default = 0), 
        lead(occurrence, default=0, n=2))) 

預期的輸出或者,這可以用data.table,而實現與類似的方法

library(data.table) 
setDT(df)[, occurrence := do.call(pmax, shift(occurrence, n = 0:2, 
    type = "lead", fill = 0)), country] 
df 
# year country occurrence 
# 1: 2003  USA   1 
# 2: 2004  USA   0 
# 3: 2005  USA   0 
# 4: 2006  USA   0 
# 5: 2007  USA   0 
# 6: 2008  USA   0 
# 7: 2009  USA   1 
# 8: 2010  USA   1 
# 9: 2011  USA   1 
#10: 2012  USA   0 
#11: 2013  USA   0 
#12: 2005  FRA   1 
#13: 2006  FRA   1 
#14: 2007  FRA   1 
#15: 2008  FRA   1 
#16: 2009  FRA   0 
#17: 2010  FRA   0 
#18: 2011  FRA   0 
#19: 2012  FRA   0 
#20: 2013  FRA   1 
#21: 2014  FRA   1 
#22: 2015  FRA   1 
+0

'data.table'版本適用於我。 'dplyr'版本適用於示例數據集,但不適用於我的真實生活數據集。它錯過了我的一些假人,並附上了一些滯後的假人,而不是我想要的線索。我不知道爲什麼。 –

相關問題