兩次繪製數據說我有這樣一個數據幀：GGPLOT2，facet_wrap：在不同方面

df <- data.frame(year_day = rep(1:365, 3), 
       year = rep(2001:2003, each = 365), 
       value = sin(2*pi*rep(1:365, 3)/365))

它代表了2001年和2003年之間的年（year_day）每天一定值（value）。我想繪製每年並使用ggplot2這樣做。

ggplot(df) + geom_point(aes(year_day, value)) + facet_wrap(~year, ncol=1)

這給了我：

大。現在，假設我想稍微擴大我的繪圖區域，以便每年包括前一年的3個月和下一年的3個月（如果存在這些數據的話）。這意味着一些數據將被繪製兩次。例如，2003年的前三個月將出現在2002年和2003年的情節中。因此，我可以複製這些行並將它們分配到2002年，但是使用year-day秒366到485.這可以工作，但是很有意思。有沒有更優雅的解決方案？

來源

2017-09-14 Lyngbakr

編輯刪除舊版本並替換

這是我一直在思考了一段時間，所以這是一個足夠的理由來試圖實現它。它仍然涉及重複行，這是行爲，但這是我能想到的最好的方式。

這是一個整潔的管道功能，它將一個數據幀（即使是分組的）作爲其第一個參數，並將一列日期作爲其第二個參數。有一個可選的第三個參數來擴展每個窗口展開的距離（默認爲0.25或3個月）。第四個理由是財政或學年等不是Jan-Jan的事情，但我還沒有深入思考。

的輸出是相同的數據幀，與歲月的尾巴重複的行，與其他列doy_wrapped爲一年中的天（從負面去> 365），和nominal_yr，這是每一個窗口都集中在一年。

實施例，使用數據集ggplot2::economics：

library(dplyr) 
library(lubridate) 

economics %>% 
    filter(year(date) > 2007)

# A tibble: 88 x 6 
     date  pce pop psavert uempmed unemploy 
     <date> <dbl> <int> <dbl> <dbl> <int> 
1 2008-01-01 9963.2 303506  3.4  9.0  7685 
2 2008-02-01 9955.7 303711  3.9  8.7  7497 
3 2008-03-01 10004.2 303907  4.0  8.7  7822 
4 2008-04-01 10044.6 304117  3.5  9.4  7637 
5 2008-05-01 10093.3 304323  7.9  7.9  8395 
6 2008-06-01 10149.4 304556  5.6  9.0  8575 
7 2008-07-01 10151.1 304798  4.4  9.7  8937 
8 2008-08-01 10140.3 305045  3.7  9.7  9438 
9 2008-09-01 10083.2 305309  4.4 10.2  9494 
10 2008-10-01 9983.3 305554  5.4 10.4 10074 
# ... with 78 more rows

economics %>% 
    filter(year(date) > 2007) %>% 
    wrap_years(date, expand = 3/12)

# A tibble: 136 x 8 
# Groups: nominal_yr [8] 
     date  pce pop psavert uempmed unemploy nominal_yr doy_wrapped 
     <date> <dbl> <int> <dbl> <dbl> <int>  <dbl>  <dbl> 
1 2008-01-01 9963.2 303506  3.4  9.0  7685  2008   1 
2 2008-02-01 9955.7 303711  3.9  8.7  7497  2008   32 
3 2008-03-01 10004.2 303907  4.0  8.7  7822  2008   61 
4 2008-04-01 10044.6 304117  3.5  9.4  7637  2008   92 
5 2008-05-01 10093.3 304323  7.9  7.9  8395  2008   122 
6 2008-06-01 10149.4 304556  5.6  9.0  8575  2008   153 
7 2008-07-01 10151.1 304798  4.4  9.7  8937  2008   183 
8 2008-08-01 10140.3 305045  3.7  9.7  9438  2008   214 
9 2008-09-01 10083.2 305309  4.4 10.2  9494  2008   245 
10 2008-10-01 9983.3 305554  5.4 10.4 10074  2009   -90 
# ... with 126 more rows

這確實讓它失去了秩序;它按順序排列三行，然後將它們重新分配到相鄰的年份。它保留原始分組，同時爲新的nominal_yr添加一個（以刪除可能爲孤立尾部，中央年份數據丟失的地方）。

economics %>% 
    filter(year(date) > 2007) %>% 
    wrap_years(date, expand = 3/12) %>% 
    ggplot(aes(doy_wrapped, unemploy)) + 
    geom_line() + facet_wrap(~nominal_yr, ncol = 3)

然後一些技巧來打扮起來，並糾正軸：

economics %>% 
    filter(year(date) > 2007) %>% 
    wrap_years(date, expand = 3/12) %>% 
    ggplot(aes(doy_wrapped + ymd("1900-01-01") - 1, unemploy)) + 
    geom_line() + facet_wrap(~nominal_yr, ncol = 2) + 
    geom_vline(xintercept = as.numeric(c(ymd("1900-01-01"), ymd("1901-01-01")))) + 
    scale_x_date(date_breaks = "2 months",date_labels = "%b", 
       name = NULL, expand = c(0,0) + 
    theme_minimal() + 
    theme(panel.spacing.x = unit(1, "cm"))

在aes(...)的+ ymd("1900-01-01") - 1是任意的，你只是希望它有一個排隊1月1日，以便每年有合適的月份。然後你將它與垂直線上的xintercept =相匹配。

理想地，這將最終成爲一個家庭的wrap_*功能的一部分，季度，月，小時，幾十年來，等

代碼的功能：

wrap_years <- function(df, datecol, expand = 0.25, offset = "2001-01-01") { 

    if(!is.data.frame(df)) {return(df)} 

    datecol <- enquo(datecol) 

    if(expand > 1) { 
    warning(paste0("Window expansions of > 1 are not supported.")) 
    return(df) 
    } 


    if(!(quo_name(datecol) %in% names(df))) { 
    warning(paste0("Column '", quo_name(datecol), "' not found in data.")) 
    return(df) 
    } 

    # offset <- as_date(offset) 
    # warning(paste0("Using ", stamp("August 26", orders = "md")(offset), 
    #    " as start of year. Not yet implemented.")) 

    if(!is.Date(df %>% pull(!!datecol))) { 
    warning(paste0("Use lubridate functions to parse '", 
        quo_name(datecol), 
        "' before proceeding.")) 
    return(df) 
    } 

    df %>% 
    mutate(adj_wrap = list(-1:1)) %>% 
    tidyr::unnest() %>% 
    mutate(nominal_yr = year(!!datecol) -  adj_wrap, 
      doy_wrapped = yday(!!datecol) + 365*adj_wrap) %>% 
    filter(between(doy_wrapped, -expand * 365, (1 + expand) * 365)) %>% 
    select(-adj_wrap) %>% 
    group_by(nominal_yr, add = T) %>% 
    filter(sum(year(!!datecol) != nominal_yr) != length(nominal_yr)) 

}

我假設複製最少數量的行將是最快的方法，這是我第一次刺穿它的範例。後來想到它，我意識到一個更幼稚的方法是簡單地複製所有行，結果會更快。然後過濾步驟用between完成，這也很快。該版本的功能大約是以前版本的2倍（但是繪製原始數據速度的約0.01倍）。

來源

2017-09-14 23:25:40 Brian

GGPLOT2，facet_wrap：在不同方面

回答

編輯刪除舊版本並替換

相關問題