2013-04-29 63 views
1

我剛剛從我們的數據記錄器中下載了大量溫度數據。數據框給出了87個溫度傳感器每小時平均觀測1691個小時的溫度(因此這裏有很多數據)。這看起來是這樣的融化並重新生成r中的新數據幀

D1_A  D1_B  D1_C 
13.43 14.39 12.33 
12.62 13.53 11.56 
11.67 12.56 10.36 
10.83 11.62 9.47 

我想這個數據集重塑成看起來像這樣一個矩陣:

#create a blank matrix 5 columns 131898 rows 
matrix1<-matrix(nrow=131898, ncol=5) 
colnames(matrix1)<- c("year", "ID", "Soil_Layer", "Hour", "Temperature") 

其中:

year is always "2012" 
ID corresponds to the header ID (e.g. D1) 
Soil_Layer corresponds to the second bit of the header (e.g. A, B, or C) 
Hour= 1:1691 for each sensor 
and Temperature= the observed values in the original dataframe. 

可這是用r中的重塑包完成?這是否需要循環完成?關於如何處理這個數據集的任何輸入都是有用的。乾杯!

+0

131898從哪裏來? 1691 * 87 = 147117。 – Chase 2013-04-30 00:10:32

回答

2

我想這你想要做什麼......你可以利用的colsplit()melt()功能包reshape2。目前還不清楚在哪裏確定數據的Hour,所以我假定它是從原始數據集中排序的。如果情況並非如此,請更新您的問題:

library(reshape2) 
#read in your data 
x <- read.table(text = " 

    D1_A D1_B D1_C 
    13.43 14.39 12.33 
    12.62 13.53 11.56 
    11.67 12.56 10.36 
    10.83 11.62 9.47 
    9.98 10.77 9.04 
    9.24 10.06 8.65 
    8.89 9.55 8.78 
    9.01 9.39 9.88 
", header = TRUE) 

#add hour index, if data isn't ordered, replace this with whatever 
#tells you which hour goes where 
x$hour <- 1:nrow(x) 
#Melt into long format 
x.m <- melt(x, id.vars = "hour") 
#Split into two columns 
x.m[, c("ID", "Soil_Layer")] <- colsplit(x.m$variable, "_", c("ID", "Soil_Layer")) 
#Add the year 
x.m$year <- 2012 

#Return the first 6 rows 
head(x.m[, c("year", "ID", "Soil_Layer", "hour", "value")]) 
#---- 
    year ID Soil_Layer hour value 
1 2012 D1   A 1 13.43 
2 2012 D1   A 2 12.62 
3 2012 D1   A 3 11.67 
4 2012 D1   A 4 10.83 
5 2012 D1   A 5 9.98 
6 2012 D1   A 6 9.24