我正在嘗試根據data.frame中包含的值重新設計和「擴展」data.frame。下面是我開始與數據幀的結構:R data frame reshape,restructure,and/or merge
開始結構:
'data.frame': 9 obs. of 5 variables:
$ Delivery.Location : chr "Henry" "Henry" "Henry" "Henry" ...
$ Price : num 2.97 2.96 2.91 2.85 2.89 ...
$ Trade.Date : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
$ Delivery.Start.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
$ Delivery.End.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...
此價格數據來自因爲實物交割被稱爲「第二天市場」市場天然氣的交易日爲,一般爲,天然氣交易日(即上述Trade.Date
)。我強調通常是,因爲在週末和節假日有例外情況發生,在這種情況下交貨期可能爲多天(即2-3天)。但是,該數據結構提供了明確聲明Delivery.Start.Date
和Delivery.End.Date
的變量。
我試圖重組data.frame以下列方式產生一些時間序列圖,並做進一步的分析:
所需的結構:
$ Delivery.Location
$ Trade.Date
$ Delivery.Date <<<-- How do I create this variable?
$ Price
如何創建Delivery.Date
變量基於現有的Delivery.Start.Date
和Delivery.End.Date
變量嗎?
換句話說,從2012-01-06 Trade.Date的數據是這樣的:
Delivery Location Price Trade.Date Delivery.Start.Date Delivery.End.Date
Henry 2.851322 2012-01-06 2012-01-07 2012-01-09
我想Delivery.Location &價格以某種方式 「填滿」 爲2012-01- 08得到的東西是這樣的:
Delivery Location Price Trade.Date Delivery.Date
Henry 2.851322 2012-01-06 2012-01-07
Henry 2.851322 2012-01-06 2012-01-08 <--new record "filled in"
Henry 2.851322 2012-01-06 2012-01-09
下面是我的data.frame的一個子集例如:
##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745), Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"), Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")
str(df)
##--------------------------------------------------------------------------------------------
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)
任何協助或大方向將不勝感激。
你能具體說明你想要的嗎? – Metrics
@Metrics:我編輯了我的問題,希望更清楚。我的道歉從一開始就沒有更具體。 – MikeTP
NP;你想要開始和結束日期之間的差異交付? – Metrics