2015-09-07 28 views
0

我有一個類似於以下的df,直到2015年纔有30年。我想每個月將其切割成三個數據,如1-10,11-20和21-31,並且平均所有十個(少於十個)數據。因此,每個月都有三個數據。我該怎麼做?如何將基於時間的年度數據縮減爲R部分的36個部分?

1993-01-29 28.92189 
1993-02-01 29.12760 
1993-02-02 29.18927 
1993-02-03 29.49786 
1993-02-04 29.62128 
1993-02-05 29.60068 
1993-02-08 29.60068 
1993-02-09 29.39498 
------ 
------ 
2015-08-18 209.92999 
2015-08-19 208.28000 
2015-08-20 204.01000 
2015-08-21 197.63001 
2015-08-24 189.55000 
2015-08-25 187.23000 
2015-08-26 194.67999 
2015-08-27 199.16000 
2015-08-28 199.24000 
+0

你有什麼已經嘗試過自己嗎?爲什麼它不能解決你的問題? – Heroka

回答

1

tryCatch是爲了消除數據開始日期問題。當我有時間時,我會提供更多信息。

library(xts) 
dates<-seq(as.Date("1993-01-29"),as.Date("2015-08-25"),"days") 
sample<-rnorm(length(dates)) 


tmpxts<-split.xts(xts(x = sample,order.by = dates),f = "months") 

mxts<-lapply(tmpxts,function(x) { 
    tmp<-data.frame(val=tryCatch(c(mean(x[1:10]),mean(x[11:20]),mean(x[21:length(x)])), 
      error=function(e) matrix(mean(x),1))) 
    row.names(tmp)<-tryCatch(index(x[c(1,11,21)]),error=function(e) index(x[1])) 
    tmp 
    }) 

do.call(rbind,mxts) 
0

下面的代碼根據每個月的天數將每個月分別削減爲三分之一。

library(dplyr) 
library(lubridate) 
library(ggplot2) 

# Fake data 
df = data.frame(date=seq.Date(as.Date("2013-01-01"), 
           as.Date("2013-03-31"), by="day")) 

set.seed(394) 
df$value = rnorm(nrow(df), sqrt(1:nrow(df)), 2) 

# Cut months into thirds 
df = df %>% 
    # Create a new column to group by Year-Month 
    mutate(yr_mon = paste0(year(date) , "_", month(date, label=TRUE, abbr=TRUE))) %>% 
    group_by(yr_mon) %>% 
    # Cut each month into thirds 
    mutate(cutMonth = cut(day(date), 
         breaks=c(0, round(1/3*n()), round(2/3*n()), n()), 
         labels=c("1st third","2nd third","3rd third")), 
    # Add yr_mon to cutMonth so that we have a unique group label for 
    # each third of each month 
     cutMonth = paste0(yr_mon, "\n", cutMonth)) %>% 
    ungroup() %>% 
    # Turn cutMonth into a factor with correct date ordering 
    mutate(cutMonth = factor(cutMonth, levels=unique(cutMonth))) 

這裏是結果:

# Show number of observations in each group 
as.data.frame(table(df$cutMonth)) 

       Var1 Freq 
1 2013_Jan\n1st third 10 
2 2013_Jan\n2nd third 11 
3 2013_Jan\n3rd third 10 
4 2013_Feb\n1st third 9 
5 2013_Feb\n2nd third 10 
6 2013_Feb\n3rd third 9 
7 2013_Mar\n1st third 10 
8 2013_Mar\n2nd third 11 
9 2013_Mar\n3rd third 10 

# Plot means by group (just to visualize the result of the date grouping operations) 
ggplot(df, aes(cutMonth, value)) + 
    stat_summary(fun.y=mean, geom='point', size=4, colour="red") + 
    coord_cartesian(ylim=c(-0.2,10.2)) + 
    theme(axis.text.x = element_text(size=14)) 

enter image description here

0

這是一個基礎的解決方案,建立經年,月從遞增序列減產週期和你的削減在第1,第11和本月21號,基本切割功能的默認設置是將間隔作爲間隔的「右側」,但是您的規格要求減少爲1,11和21(將10和20分開間隔),所以我用right = TRUE:

tapply(dat$V2, cut.Date(dat$V1, 
         breaks=as.Date( 
           apply(expand.grid(c(1,11,21), 1:12, 1993:2015), 1, 
            function(x) paste(rev(x), collapse="-"))), right=TRUE), FUN=mean) 


1993-01-01 1993-01-11 1993-01-21 1993-02-01 1993-02-11 1993-02-21 1993-03-01 
     NA   NA 29.02475 29.48412   NA   NA   NA 
snipped many empty intervals 

和結果的底部包括:

2015-07-21 2015-08-01 2015-08-11 2015-08-21 2015-09-01 2015-09-11 2015-09-21 
     NA   NA 204.96250 193.97200   NA   NA   NA 
2015-10-01 2015-10-11 2015-10-21 2015-11-01 2015-11-11 2015-11-21 2015-12-01 
     NA   NA   NA   NA   NA   NA   NA 
2015-12-11 
     NA