2017-05-15 91 views
0

我有這樣一個數據幀:按學年如何按學年分組?

data.frame(
     date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
     16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
     n= 1:14 
    ) 

我如何可以總結n?每個學年都應該從十二月到八月。例如,我想在每個學年總結n。手動重構不是一個選項,因爲值太多,有時甚至缺少值。

最終,重構應該是這樣的:

date   a.y. 

"2012-05-01" 2011/2012 
"2012-08-01" 2011/2012 

"2012-12-01" 2012/2013 
"2013-05-01" 2012/2013 
"2013-08-01" 2012/2013 

"2013-12-01" 2013/2014 
"2014-05-01" 2013/2014 

"2014-12-01" 2014/2015 
"2015-05-01" 2014/2015 
"2015-08-01" 2014/2015 

"2015-12-01" 2015/2016 
"2016-05-01" 2015/2016 
"2016-08-01" 2015/2016 

"2016-12-01" 2016/2017 

正如你可以看到,日期遵循類似的模式,但每學年可能有不同數量的日期。

+0

我不明白yoru重構輸出。 n在哪裏?你不想每學年只有一行作最後的輸出嗎? – Kristofersen

+1

此外,不應該是2016/2017的最大值?這與a.y的其餘部分一致。 – Kristofersen

+0

@Kristofersen謝謝,它似乎不夠清楚哪些日期會對應於哪個學年,而僞輸出只是表明這一點。另外,對於每個學年,是否有重複的行具有相同的'n'值而不是唯一的行,這與我一樣。 – Dambo

回答

1

如果我在看到12月份的錄入條目後立即閱讀此權限,我們會更改學年。如果這是真的,那麼下面的代碼將起作用。

library(data.table) 
library(lubridate) 
df = data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

df$AcademicYear = cumsum(month(df$date) == 12) 
setDT(df) 
df[ , .(Sum = sum(n)), by = .(AcademicYear)] 

    AcademicYear Sum 
1:   0 3 
2:   1 12 
3:   2 13 
4:   3 27 
5:   4 36 
6:   5 14 

編輯

的重構,你可以做這樣的事情。它由AcademicYear尋找一個月,然後根據月份,它知道增加或減去一年並粘貼在一起。然後,該列只需要重新命名並如上所述進行求和。

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
           ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
             paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)] 

> df 
      date n AcademicYear AcademicYear2 
1: 2012-05-01 1   0  2011/2012 
2: 2012-08-01 2   0  2011/2012 
3: 2012-12-01 3   1  2012/2013 
4: 2013-05-01 4   1  2012/2013 
5: 2013-08-01 5   1  2012/2013 
6: 2013-12-01 6   2  2013/2014 
7: 2014-05-01 7   2  2013/2014 
8: 2014-12-01 8   3  2014/2015 
9: 2015-05-01 9   3  2014/2015 
10: 2015-08-01 10   3  2014/2015 
11: 2015-12-01 11   4  2015/2016 
12: 2016-05-01 12   4  2015/2016 
13: 2016-08-01 13   4  2015/2016 
14: 2016-12-01 14   5  2016/2017 

編輯2

決定把所有的代碼放在一起。這應該讓你找到你想要的最終結果。

library(data.table) 
library(lubridate) 
df = data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

setDT(df) 
df$AcademicYear = cumsum(month(df$date) == 12) 

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
           ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
             paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)] 


df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)] 

> df 
    AcademicYear Sum 
1: 2011/2012 3 
2: 2012/2013 12 
3: 2013/2014 13 
4: 2014/2015 27 
5: 2015/2016 36 
6: 2016/2017 14 
0

不確定你想要什麼條件與什麼日期,但你可以使用dplyr和mutate與一系列if else語句。它很慢,但它的工作原理。

df <- data.frame(
    date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
    n= 1:14 
) 

df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01") & date <= as.Date("2012-08-01"), "1", 
     ifelse(date >= as.Date("2012-12-01") & date <= as.Date("2013-05-01"), "2", 
      ifelse(date >= as.Date("2013-12-01") & date <= as.Date("2014-12-01"), "3", 
     ifelse(date >= as.Date("2015-08-01") & date <= as.Date("2016-08-01"), "4", 
      "other")))))