2013-07-11 66 views
0

我正在嘗試根據data.frame中包含的值重新設計和「擴展」data.frame。下面是我開始與數據幀的結構:R data frame reshape,restructure,and/or merge

開始結構:

'data.frame': 9 obs. of 5 variables: 
$ Delivery.Location : chr "Henry" "Henry" "Henry" "Henry" ... 
$ Price    : num 2.97 2.96 2.91 2.85 2.89 ... 
$ Trade.Date   : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ... 
$ Delivery.Start.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ... 
$ Delivery.End.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ... 

此價格數據來自因爲實物交割被稱爲「第二天市場」市場天然氣的交易日爲,一般爲,天然氣交易日(即上述Trade.Date)。我強調通常是,因爲在週末和節假日有例外情況發生,在這種情況下交貨期可能爲多天(即2-3天)。但是,該數據結構提供了明確聲明Delivery.Start.DateDelivery.End.Date的變量。

我試圖重組data.frame以下列方式產生一些時間序列圖,並做進一步的分析:

所需的結構:

$ Delivery.Location 
$ Trade.Date 
$ Delivery.Date <<<-- How do I create this variable? 
$ Price 

如何創建Delivery.Date變量基於現有的Delivery.Start.DateDelivery.End.Date變量嗎?

換句話說,從2012-01-06 Trade.Date的數據是這樣的:

Delivery Location Price  Trade.Date  Delivery.Start.Date  Delivery.End.Date  
Henry    2.851322 2012-01-06  2012-01-07    2012-01-09 

我想Delivery.Location &價格以某種方式 「填滿」 爲2012-01- 08得到的東西是這樣的:

Delivery Location  Price  Trade.Date  Delivery.Date 
Henry     2.851322 2012-01-06  2012-01-07 
Henry     2.851322 2012-01-06  2012-01-08 <--new record "filled in" 
Henry     2.851322 2012-01-06  2012-01-09 

下面是我的data.frame的一個子集例如:

##-------------------------------------------------------------------------------------------- 
## sample data 
##-------------------------------------------------------------------------------------------- 
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745), Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"), Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame") 

str(df) 

##-------------------------------------------------------------------------------------------- 
## create sequence of Delivery.Dates to potentially use 
##-------------------------------------------------------------------------------------------- 
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date))) 
Delivery.Date <- seq(rng[1], rng[2], by=1) 

任何協助或大方向將不勝感激。

+0

你能具體說明你想要的嗎? – Metrics

+0

@Metrics:我編輯了我的問題,希望更清楚。我的道歉從一開始就沒有更具體。 – MikeTP

+0

NP;你想要開始和結束日期之間的差異交付? – Metrics

回答

2

您可以使用ddplyplyr

library(plyr) 
ddply(
     df, 
     c("Delivery.Location","Trade.Date"), 
     function(trade) 
     data.frame(
     trade, 
     Delivery.Date=seq(
      from=trade$Delivery.Start.Date, 
      to=trade$Delivery.End.Date, 
      by="day") 
    ) 
) 

當然,你仍然要執行關於週末,節假日等

我也承擔了邏輯Delivery.LocationTrade.Date足以識別單一交易。

1

這樣好嗎?

library(plyr) 



lookuptable<-df[,2:3] 

Trade.Date<-df[,4] 
filluptable1<-as.data.frame(Trade.Date) 
Trade.Date<-df[,5] 
filluptable2<-as.data.frame(Trade.Date) 

myfillstart<- join(filluptable1, lookuptable, by = "Trade.Date") 
myfillstart<- rename(myfillstart, c(Trade.Date="Delivery.Start.Date")) 
myfillstart<- rename(myfillstart, c(Price="Price.Start.Date")) 
myfillend<- join(filluptable2, lookuptable, by = "Trade.Date") 
myfillend<- rename(myfillend, c(Trade.Date="Delivery.End.Date")) 
myfillend<- rename(myfillend, c(Price="Price.End.Date")) 
finaldf<-cbind(df[,1:3],myfillstart,myfillend) 



finaldf 
    Delivery.Location Price Trade.Date Delivery.Start.Date Price.Start.Date Delivery.End.Date Price.End.Date 
35    Henry 2.965398 2012-01-03   2012-01-04   2.959077  2012-01-04  2.959077 
150    Henry 2.959077 2012-01-04   2012-01-05   2.906436  2012-01-05  2.906436 
263    Henry 2.906436 2012-01-05   2012-01-06   2.851322  2012-01-06  2.851322 
377    Henry 2.851322 2012-01-06   2012-01-07    NA  2012-01-09  2.890364 
493    Henry 2.890364 2012-01-09   2012-01-10   2.965585  2012-01-10  2.965585 
607    Henry 2.965585 2012-01-10   2012-01-11   2.807734  2012-01-11  2.807734 
724    Henry 2.807734 2012-01-11   2012-01-12   2.702072  2012-01-12  2.702072 
838    Henry 2.702072 2012-01-12   2012-01-13   2.671732  2012-01-13  2.671732 
955    Henry 2.671732 2012-01-13   2012-01-14    NA  2012-01-17    NA 

注:既然你有相同的位置,我沒有擡頭的位置。但是,你也可以這樣做。代碼看起來有點雜亂。 Here是你也可以經歷的選擇。