2015-01-07 39 views
0

我有一個相當複雜的任務,我需要執行如此的任務。我猜測這將是可能的,但如果不是,請告訴我。在R中格式化一個數據幀

假設我有以下數據

set.seed(123) 
date1 <- c(seq(as.Date("2011-11-1"),as.Date("2012-1-1"),by = "months"),seq(as.Date("2011-12-1"),as.Date("2012-3-1"),by = "months")) 
date2 <- c(seq(as.Date("2011-12-1"),as.Date("2012-1-1"),by = "months"),seq(as.Date("2011-11-1"),as.Date("2012-1-1"),by = "months")) 
variables <- c(rep("Number of Coins",3),rep("Number of Shoes",4),rep("Number of Coins",2),rep("Number of Shoes",3)) 
date <- c(date1,date2) 
names <- c(rep("Jim",7),rep("Arnold",5)) 
value <- rnorm(12) 
df <- data.frame(names, date, variables, value) 

    names  date  variables  value 
1  Jim 2011-11-01 Number of Coins -0.56047565 
2  Jim 2011-12-01 Number of Coins -0.23017749 
3  Jim 2012-01-01 Number of Coins 1.55870831 
4  Jim 2011-12-01 Number of Shoes 0.07050839 
5  Jim 2012-01-01 Number of Shoes 0.12928774 
6  Jim 2012-02-01 Number of Shoes 1.71506499 
7  Jim 2012-03-01 Number of Shoes 0.46091621 
8 Arnold 2011-12-01 Number of Coins -1.26506123 
9 Arnold 2012-01-01 Number of Coins -0.68685285 
10 Arnold 2011-11-01 Number of Shoes -0.44566197 
11 Arnold 2011-12-01 Number of Shoes 1.22408180 
12 Arnold 2012-01-01 Number of Shoes 0.35981383 

這個數據的問題是,變量名佔用一列。我想爲Number of ShoesNumber of Coins創建兩列,但我想確保日期保持不變。理想我想打開該數據幀到該

names date Number.of.Coins Number.of.Shoes 
1  Jim 11/1/11  -0.5604756    NA 
2  Jim 12/1/11  -0.2301775  0.07050839 
3  Jim 1/1/12  1.5587083  0.12928773 
4  Jim 2/1/12    NA  1.71506499 
5  Jim 3/1/12    NA  0.46091621 
6 Arnold 11/1/11    NA  -0.44566197 
7 Arnold 12/1/11  -1.2650612  1.22408180 
8 Arnold 1/1/12  -0.6868529  0.35981383 

所以的時間範圍將是最小日期爲每個變量對每個變量的最大日期。這將創建對NAs的需求。我想在每個name之內做到這一點。希望是有道理的!

+0

您可以使用從reshape2 http://seananderson.ca/2013/10/19/reshape.html dcast熔融 – ajkl

回答

2

正如@ Ajinkya Kale所建議的那樣,您可以使用reshape2包處理此任務。

dcast(df, names + date ~ variables, value.var = "value") 

如果你想確保日期的順序是按時間順序排列,你可以在dplyr包中使用arrange()

arrange(dcast(df, names + date ~ variables, value.var = "value"), names, date) 

# names  date Number of Coins Number of Shoes 
#1 Arnold 2011-11-01    NA  -0.44566197 
#2 Arnold 2011-12-01  -1.2650612  1.22408180 
#3 Arnold 2012-01-01  -0.6868529  0.35981383 
#4 Jim 2011-11-01  -0.5604756    NA 
#5 Jim 2011-12-01  -0.2301775  0.07050839 
#6 Jim 2012-01-01  1.5587083  0.12928774 
#7 Jim 2012-02-01    NA  1.71506499 
#8 Jim 2012-03-01    NA  0.46091621 
0

另一種選擇是使用spreadtidyr

library(tidyr) 
spread(df, variables, value) 
# names  date Number of Coins Number of Shoes 
#1 Arnold 2011-11-01    NA  -0.44566197 
#2 Arnold 2011-12-01  -1.2650612  1.22408180 
#3 Arnold 2012-01-01  -0.6868529  0.35981383 
#4 Jim 2011-11-01  -0.5604756    NA 
#5 Jim 2011-12-01  -0.2301775  0.07050839 
#6 Jim 2012-01-01  1.5587083  0.12928774 
#7 Jim 2012-02-01    NA  1.71506499 
#8 Jim 2012-03-01    NA  0.46091621