2016-12-02 208 views
0

我有一套34年的網格化海表溫度的日常值(每天12418個文件x 4248點),並假裝計算每週值。在這篇文章https://stackoverflow.com/a/15102394/709777之後,我幾乎成功了。但是日期和星期之間有一些分歧。我無法找到這一點,我想確定我得到了計算每週平均值的正確日期。R每週平均值

我用這塊我的 - [R腳本的閱讀日常數據並構建(由4248的列/溫度12418行/天)包含從在列的單點的所有每日值的大數據幀

# Paths 
ruta_datos_diarios<-"/home/meteo/PROJECTES/VERSUS/DATA/SST/CSV/" 
ruta_files<-"/home/meteo/PROJECTES/VERSUS/SCRIPTS/CLUSTER/FILES/" 
ruta_eixida<-"/home/meteo/PROJECTES/VERSUS/OUTPUT/DATA/SEMANAL/" 

# List of daily files 
files <- list.files(path = ruta_datos_diarios, pattern = "SST-diaria-MED") 

output <- matrix(ncol=4248, nrow=length(files)) 
fechas <- matrix(ncol=1, nrow=length(files)) 

for (i in 1:length(files)){ 
    # read data 
    datos<-read.csv(paste0(ruta_datos_diarios,files[i],sep=""),header=TRUE,na.strings = "NA") 
    datos<-datos[complete.cases(datos),] 

    # Extract dates from daily file names 
    yyyy<-substr(files[i],16,19) 
    mm<-substr(files[i],20,21) 
    dd<-substr(files[i],22,23) 
    dates[i,]<-paste0(yyyy,"-",mm,"-",dd,sep="") 

    output[i,]<-t(datos$sst) 
} 

datos.df<-as.data.frame(output) 

# Build a dataframe with the dates (day, week and year) 
fechas<-as.data.frame(fechas) 
fechas$V1<-as.Date(fechas$V1) 
fechas$Week <- week(fechas$V1) 
fechas$Year <- year(fechas$V1) 

# Extract day of the week (Saturday = 6) 
fechas$Week_Day <- as.numeric(format(fechas$V1, format='%w')) 
# Adjust end-of-week date (first saturday from the original Date) 
fechas$End_of_Week <- fechas$V1 + (6 - fechas$Week_Day) 

# new dataframe from End_of_Week 
fechas.semana<-fechas[!duplicated(fechas$End_of_Week),] 
fechas.semana<-as.data.frame(fechas.semana) 

colnames(fechas)<-c("Day","Week","Year","Week_Day","End_of_Week") 
colnames(fechas.semana)<-c("Day","Week","Year","Week_Day","End_of_Week") 

這是我讀取數據和日期的方式。爲了保留一個簡短的例子,我已經在這個文件temp-sst.csv(包括「Day」,「Week」,「Year」,「Week_Day」,「End_of_Week」等10個變量)中保存了一部分數據幀。

sst.dat <- read.csv("temp-dat.csv",header=TRUE) 

# Join dates and SST values 
sst.dat <- cbind(fechas, sst.dat) 

# Build new dates data frame 
fechas<-as.data.frame(sst.dat$Day) 
colnames(fechas)<-c("Day") 
fechas$Day<-as.Date(fechas$Day) 
fechas$Week <- week(fechas$Day) 
fechas$Year <- year(fechas$Day) 
# Extract day of the week (Saturday = 6) 
fechas$Week_Day <- as.numeric(format(fechas$Day, format='%w')) 
# Adjust end-of-week date (first saturday from the original Date) 
fechas$End_of_Week <- fechas$Day + (6 - fechas$Week_Day) 

fechas.semana<-fechas[!duplicated(fechas$End_of_Week),] 
fechas.semana<-as.data.frame(fechas.semana) 

colnames(fechas)<-c("Day","Week","Year","Week_Day","End_of_Week") 
colnames(fechas.semana)<-c("Day","Week","Year","Week_Day","End_of_Week") 

# Weekly aggregation function from the referred post 
media.semanal <- function(x, column){ 
    a<-aggregate(x[,column]~End_of_Week+Year, FUN=mean, data=x, na.rm=TRUE) 
    colnames(a)<-c("End_of_Week","Year","SSTmean") 
    return(a) 
} 

# Matrix to be populated by weekly function 
SST.mat<-matrix(nrow=nrow(fechas.semana), ncol=length(sst.dat)-5) # 5 son las columnas de fecha 

for (j in 6:length(sst.dat)){ # comienza en 6 para evitar las columnas de fecha 
b<-media.semanal(sst.dat,j) 
SST.mat[,j-5]<-b$SSTmean 
} 

但是問題來了。循環中的「b」數據框有145行,而SST.mat和fechas.semana只有144行。我還沒有找到這種不一致的地方。

任何幫助將不勝感激,我卡在這裏。 謝謝

+6

「_To保持短example_」 - 而不是發佈一個鏈接到Dropbox的上一個1000 * 10的文件,你應該提供一個_minimal_,自成體系的例子。 – Henrik

+0

你是對的@henrik,有用的標誌提出 – pacomet

回答

1

您有一個b$End_of_Week的重複。

首先,我注意到,有在集合成員資格沒有任何區別:

setdiff(as.character(b$End_of_Week),as.character(fechas.semana$End_of_Week)) 

字符(0)

然後我意識到,必須是因爲重複的,並證實了它像這樣:

table(table(as.character(b$End_of_Week))>1) 
143 1 
FALSE TRUE 

看着桌子上顯示的暗號是1983-01-01

看來根本原因是,你通過End_of_Week + Year其中Year是不必要的聚集,因爲End_of_Week有當年一樣好,如果你只通過彙總你End_of_Week得到144,而不是145

# Weekly aggregation function from the referred post 
media.semanal <- function(x, column){ 
    a<-aggregate(x[,column]~End_of_Week, FUN=mean, data=x, na.rm=TRUE) 
    colnames(a)<-c("End_of_Week","SSTmean") 
    return(a) 
} 

# Matrix to be populated by weekly function 
SST.mat<-matrix(nrow=nrow(fechas.semana), ncol=length(sst.dat)-5) # 5 son las columnas de fecha 

for (j in 6:length(sst.dat)){ # comienza en 6 para evitar las columnas de fecha 
    b<-media.semanal(sst.dat,j) 
    SST.mat[,j-5]<-b$SSTmean 
} 
dim(b)