2017-08-07 184 views
1

我想計算對數增長率分組和我很努力,使其與在data.table的by -clause兩個變量工作。 我有涵蓋生產隨着時間的推移data.table,我想隨着時間的推移,每個組計算對數增長率。計算增長率和兩個變量

library(zoo) 
library(data.table) 
library(ggplot2) 
library(dplyr) 
DT <- structure(list(Year.Quarter = structure(c(2015, 2015, 2015, 2015, 
              2015, 2015.25, 2015.25, 2015.25, 2015.25, 2015.25, 2015.5, 2015.5, 
              2015.5, 2015.5, 2015.5, 2015.75, 2015.75, 2015.75, 2015.75, 2015.75, 
              2016, 2016, 2016, 2016, 2016, 2016.25, 2016.25, 2016.25, 2016.25, 
              2016.25), class = "yearqtr") 
             ,Group = structure(c(2L, 1L, 4L, 
                  3L, NA, 2L, 1L, 4L, 3L, NA, 2L, 1L, 4L, 3L, NA, 2L, 1L, 4L, 3L, NA, 2L, 1L, 4L, 3L, NA, 2L, 1L, 4L, 3L, NA), .Label = c("1", "2", "3", "4"), class = "factor") 
             , Conventional.Prod = c(11.78, 7.31, 7.34, 9.44, 28.72, 11.32, 5.27, 7.47, 8.08, 27.14, 11.49, 
                   4.65, 7.63, 7.07, 25.93, 10.69, 3.68, 6.96, 6.72, 18.31, 9.28, 
                   3.69, 6.86, 6.34, 19.14, 9.25, 3.69, 6.9, 6.16, 17.7) 
             , Unconventional.Prod = c(15.22, 10.69, 7.66, 15.56, 30.28, 15.68, 10.73, 7.53, 15.92, 29.86, 
                 13.51, 10.35, 7.37, 15.93, 28.07, 13.31, 10.32, 7.04, 16.28, 
           25.69, 12.72, 9.31, 7.14, 16.66, 25.86, 12.75, 9.31, 7.1, 16.84, 24.3)) 
         , .Names = c("Year.Quarter", "Group", "Conventional.Prod", "Unconventional.Prod"), row.names = c(NA, -30L), class = c("data.table", 
                 "data.frame")) 

DT[, .(Conventional.Prod 
     , d.log.Conventional.Prod = log(Conventional.Prod, base = exp(1)) - shift(log(Conventional.Prod, base = exp(1)), n = 1L , fill = NA, type = "lag") 
     , Log.Conventional.Prod = log(Conventional.Prod, base = exp(1)) 
     , Lag.Log.Conventional.Prod = shift(log(Conventional.Prod, base = exp(1)), n = 1L , fill = NA, type = "lag") 
     ), by = list(Group, Year.Quarter)] 

我不知道,爲什麼它不分組,由本集團可變正常有序的,爲什麼它是不可能計算出生產的滯後值。我不認爲因素變量有問題,因爲排序工作得很好。

DT[order(Group, Year.Quarter)] 

Year.Quarter Group Conventional.Prod Unconventional.Prod 
1:  2015 Q1  1    7.31    10.69 
2:  2015 Q2  1    5.27    10.73 
3:  2015 Q3  1    4.65    10.35 
4:  2015 Q4  1    3.68    10.32 
5:  2016 Q1  1    3.69    9.31 
6:  2016 Q2  1    3.69    9.31 
7:  2015 Q1  2    11.78    15.22 
8:  2015 Q2  2    11.32    15.68 
9:  2015 Q3  2    11.49    13.51 
10:  2015 Q4  2    10.69    13.31 
[...] 

回答

0

通過@sirallen擴大在回答我得到沒有任何額外的功能,只用data.table工具的解決方案。

setkey(DT, Group, Year.Quarter) 
DT[, .(Year.Quarter, Conventional.Prod 
     , d.log.Conventional.Prod = log(Conventional.Prod, base = exp(1)) - shift(log(Conventional.Prod, base = exp(1)), n = 1L , fill = NA, type = "lag") 
     , Log.Conventional.Prod = log(Conventional.Prod, base = exp(1)) 
     , Lag.Log.Conventional.Prod = shift(log(Conventional.Prod, base = exp(1)), n = 1L , fill = NA, type = "lag") 
     ), by = list(Group)] 

如果有人能解釋爲什麼在兩個變量進行分組時不起作用,那將是非常好的。

1

你可以這樣做:

setkey(DT, Group, Year.Quarter) 

logG = function(x) c(NA, diff(log(x))) 

DT[!is.na(Group), .(Year.Quarter, logG(Conventional.Prod), logG(Unconventional.Prod)), by='Group'] 

#  Group Year.Quarter   V2   V3 
# 1:  1  2015 Q1   NA   NA 
# 2:  1  2015 Q2 -0.327212911 0.0037348316 
# 3:  1  2015 Q3 -0.125163143 -0.0360570369 
# 4:  1  2015 Q4 -0.233954467 -0.0029027597 
# 5:  1  2016 Q1 0.002713706 -0.1029946688 
# 6:  1  2016 Q2 0.000000000 0.0000000000 
# 7:  2  2015 Q1   NA   NA 
# 8:  2  2015 Q2 -0.5 0.0297756625 
# 9:  2  2015 Q3 0.014906019 -0.1489558630 
# 10:  2  2015 Q4 -0.072168367 -0.0149145196 
# 11:  2  2016 Q1 -0.141447178 -0.0453400745 
# 12:  2  2016 Q2 -0.003237995 0.0023557137 
# ... 
+0

好了,好工作的,但我並不怎麼鍵控和分組只是'Group'是解決這個理解。 – hannes101

+0

鍵控只是對數據進行排序的方式,一個必要的步驟。並且通過'。(Group,Year.Quarter)'進行分組在這裏沒有意義。您只需各組中的一個觀察給定產品 – sirallen

+0

是啊,真的,我只是想確保增長速度始終計算每個'Group'和'Year.Quarter'組合,而不是每個'年不同羣體之間.Quarter'。 非常感謝您的回答! – hannes101