2
我有兩個數據表,我們可以稱它們爲weights
和values
。
該weights
表具有5列如下:data.table連接使用兩個列從一個表和一列從其他
first POSIXct
late POSIXct
nodeid integer
aggid integer
weight numeric
的values
表具有這些列
nodeid integer
Date POSIXct
hour integer
value decimal
的想法是,以產生一個新的表,其中將採取的節點的加權平均成基於權重的聚合節點。但是,權重隨時間而變化,需要根據第一個和最後一個日期進行匹配。 SQL語法要做到這一點會是這個樣子
select v.Date, v.hour, w.aggid, sum(v.value*w.weight) as aggvalue
from values v inner join weights w
on v.nodeid=w.nodeid and v.date between w.first and w.late
group by aggid, date, hour
我真的不知道從哪裏開始就這一個在SQL語法給出的between
邏輯。這可能在data.table語法中,或者我需要將weights
表變成每一天都有一行,而不是使用範圍?
下面是一些示例數據(抱歉它是如此長)...
values<-data.table(nodeid = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L), Date = c("2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10", "2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10", "2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10", "2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10", "2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10", "2013-07-02", "2013-07-02", "2013-07-05",
"2013-07-08", "2013-07-10"), hour = c(1L, 2L, 23L, 2L, 2L, 1L,
2L, 23L, 2L, 2L, 1L, 2L, 23L, 2L, 2L, 1L, 2L, 23L, 2L, 2L, 1L,
2L, 23L, 2L, 2L, 1L, 2L, 23L, 2L, 2L), value = c(8.234, 3.218,
0.787, 8.689, 6.218, 6.89, 1.914, 2.459, 6.683, 8.122, 0.281,
1.136, 1.993, 7.27, 9.582, 5.777, 1.375, 9.204, 7.862, 0.633,
2.433, 1.842, 7.178, 10.692, 1.417, 1.259, 2.619, 0.031, 6.744,
5.941))
weights<-data.table(first = c("2013-07-01", "2013-07-01", "2013-07-01",
"2013-07-01", "2013-07-01", "2013-07-01", "2013-07-08", "2013-07-08",
"2013-07-08", "2013-07-08", "2013-07-08", "2013-07-08"), late = c("2013-07-07",
"2013-07-07", "2013-07-07", "2013-07-07", "2013-07-07", "2013-07-07",
"2013-07-20", "2013-07-20", "2013-07-20", "2013-07-20", "2013-07-20",
"2013-07-20"), nodeid = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L), aggid = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L), weight = c(0.5, 0.25, 0.25, 0.3, 0.5, 0.2, 0.6, 0.2,
0.2, 0.4, 0.45, 0.15))
exresults<-data.table(aggid = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), Date = c("2013-07-02", "2013-07-02", "2013-07-02", "2013-07-02",
"2013-07-05", "2013-07-05", "2013-07-08", "2013-07-08", "2013-07-10",
"2013-07-10"), hour = c(1L, 1L, 2L, 2L, 23L, 23L, 2L, 2L, 2L,
2L), aggvalue = c(5.90975, 3.2014, 2.3715, 1.8573, 1.5065, 6.3564,
8.004, 8.9678, 7.2716, 1.782))
我看你不需要'第一個'專欄,並承認我需要做更多的閱讀'卷',但有沒有辦法使用'第一'而不是'遲'。我的真實數據來源,我只有'第一個'日期,我不得不創建'late'列來創建範圍。如果有一種語法允許我只使用「第一個」列,那麼我可以完全跳過創建「late」列。這是可能的還是一個必要的步驟? –
我將'weights'上的鍵改爲'first'而不是'late',並將'roll = -Inf'改爲'roll = Inf',這看起來可行。 –
@DeanMacGregor是的,就是這樣 – eddi