2016-01-27 37 views
3

我有兩個數據表,我試圖合併。一個是公司市場價值隨時間變化的數據,另一個是隨時間變化的公司股利歷史數據。我試圖找出每家公司每個季度已經支付了多少錢,並將價值隨時間推移到市場價值數據的旁邊。如何做一個data.table滾動連接?

library(magrittr) 
library(data.table) 
library(zoo) 
library(lubridate) 

set.seed(1337) 
# data table of company market values 
companies <- 
    data.table(companyID = 1:10, 
       Sedol = rep(c("91772E", "7A662B"), each = 5), 
       Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1), 
       MktCap = c(100 + cumsum(rnorm(5,5)), 
          50 + cumsum(rnorm(5,1,5)))) %>% 
    setkey(Sedol, Date) 

# data table of dividends 
dividends <- 
    data.table(DivID = 1:7, 
       Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)), 
       Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29', 
           '2005-10-01', '2005-06-29', '2005-06-30', 
           '2006-04-17')), 
       DivAmnt = rnorm(7, .8, .3)) %>% 
    setkey(Sedol, Date) 

我相信這是一個情況下,你可以使用一個data.table滾動加盟,是這樣的:

dividends[companies, roll = "nearest"] 

,試圖得到一個數據集,看起來像

 DivID Sedol  Date DivAmnt companyID MktCap 
    1: NA 7A662B  <NA>  NA   6 61.21061 
    2:  5 7A662B 2005-06-29 0.7772631   7 66.92951 
    3:  6 7A662B 2005-06-30 1.1815343   7 66.92951 
    4: NA 7A662B  <NA>  NA   8 78.33914 
    5: NA 7A662B  <NA>  NA   9 88.92473 
    6: NA 7A662B  <NA>  NA  10 87.85067 
    7:  2 91772E 2005-01-13 0.2964291   1 105.19249 
    8:  3 91772E 2005-01-29 0.8472649   1 105.19249 
    9: NA 91772E  <NA>  NA   2 108.74579 
    10:  4 91772E 2005-10-01 1.2467408   3 113.42261 
    11: NA 91772E  <NA>  NA   4 120.04491 
    12: NA 91772E  <NA>  NA   5 124.35588 

(請注意,我已將公司市場價值的股息與確切的季度相匹配)

但我不完全是s如何執行它。如果roll是一個值(你能通過日期嗎?一個數字是否可以量化前進的日子嗎??的數量?)並且改變rollends似乎並不是讓我得到我想要的。

最後,我最終將股利日期映射到季末,然後加入。一個好的解決方案,但是如果我最終需要知道如何執行滾動連接,那麼這個解決方案就沒有用處在你的回答中,你能否描述一種情況:滾動連接是唯一的解決方案,並幫助我理解如何執行它們?

+2

你能描述一下你想要做什麼嗎? – mtoto

+0

不知怎的,你的代碼不會給出正確的data.tables;可以提供'公司'的dput()而不是? – Jaap

+0

我忘了放'library(lubridate)'聲明。感謝您的發現。 – jks612

回答

4

而是滾動的加入,您可能需要使用重疊的data.tablefoverlaps功能加入:

# create an interval in the 'companies' datatable 
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))] 
# create a second date in the 'dividends' datatable 
dividends[, Date2 := divDate] 

# set the keys for the two datatable 
setkey(companies, Sedol, start, end) 
setkey(dividends, Sedol, dDate, Date2) 

# create a vector of columnnames which can be removed afterwards 
deletecols <- c("Date2","start","end") 

# perform the overlap join and remove the helper columns 
res <- foverlaps(companies, dividends)[, (deletecols) := NULL] 

結果:

> res 
    Sedol DivID divDate DivAmnt companyID compDate MktCap 
1: 7A662B NA  <NA>  NA   6 2005-03-31 61.21061 
2: 7A662B  5 2005-06-29 0.7772631   7 2005-06-30 66.92951 
3: 7A662B  6 2005-06-30 1.1815343   7 2005-06-30 66.92951 
4: 7A662B NA  <NA>  NA   8 2005-09-30 78.33914 
5: 7A662B NA  <NA>  NA   9 2005-12-31 88.92473 
6: 7A662B NA  <NA>  NA  10 2006-03-31 87.85067 
7: 91772E  2 2005-01-13 0.2964291   1 2005-03-31 105.19249 
8: 91772E  3 2005-01-29 0.8472649   1 2005-03-31 105.19249 
9: 91772E NA  <NA>  NA   2 2005-06-30 108.74579 
10: 91772E  4 2005-10-01 1.2467408   3 2005-09-30 113.42261 
11: 91772E NA  <NA>  NA   4 2005-12-31 120.04491 
12: 91772E NA  <NA>  NA   5 2006-03-31 124.35588 

使用數據(與問題中相同,但沒有創建密鑰):

set.seed(1337) 
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5), 
         compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1), 
         MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5)))) 
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)), 
         divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')), 
         DivAmnt = rnorm(7, .8, .3)) 
+0

何時滾動連接更合適?文檔似乎說這些事情是爲什麼創建了滾動連接。 – jks612

+0

@ jks612將再次考慮這一點。我記得滾動連接並沒有給出預期的結果,但會再次看到它。希望這週末我能參加。 – Jaap

+0

好主意,謝謝! – msp