2013-04-14 24 views
2

在使用by =時,我正在分配一個新列作爲DateIDate。這是創建一個整數列,而不是預期的Date無法通過data.table中的引用指定列as.Date

require(data.table) 
dt <- data.table(date = as.IDate(sample(10000:11000, 10), 
           origin = "1970-01-01")) 
dt[, group := rep(1:2, 5)] 
print(dt) 

#   date group 
# 1: 1997-06-12  1 
# 2: 1998-02-19  2 
# 3: 1998-04-25  1 
# 4: 1998-01-27  2 
# 5: 1997-10-29  1 
# 6: 1998-05-08  2 
# 7: 1999-05-09  1 
# 8: 1999-06-26  2 
# 9: 1997-11-01  1 
# 10: 1997-07-19  2 

這工作:

dt[, min.date := min(date)] 
print(dt) 

#   date group min.date 
# 1: 1997-06-12  1 1997-06-12 
# 2: 1998-02-19  2 1997-06-12 
# 3: 1998-04-25  1 1997-06-12 
# 4: 1998-01-27  2 1997-06-12 
# 5: 1997-10-29  1 1997-06-12 
# 6: 1998-05-08  2 1997-06-12 
# 7: 1999-05-09  1 1997-06-12 
# 8: 1999-06-26  2 1997-06-12 
# 9: 1997-11-01  1 1997-06-12 
# 10: 1997-07-19  2 1997-06-12 

但這裏有一個問題:

dt[, min.group.date := as.IDate(min(date)), by = group] 
print(dt) 

#   date group min.date min.group.date 
# 1: 1997-06-12  1 1997-06-12   10024 
# 2: 1998-02-19  2 1997-06-12   10061 
# 3: 1998-04-25  1 1997-06-12   10024 
# 4: 1998-01-27  2 1997-06-12   10061 
# 5: 1997-10-29  1 1997-06-12   10024 
# 6: 1998-05-08  2 1997-06-12   10061 
# 7: 1999-05-09  1 1997-06-12   10024 
# 8: 1999-06-26  2 1997-06-12   10061 
# 9: 1997-11-01  1 1997-06-12   10024 
# 10: 1997-07-19  2 1997-06-12   10061 

min.group.date是數字,而不是Date

dt[, class(min.group.date)] 

# [1] "numeric" 

如果我初始化列作爲DateIDate,它按預期工作:

dt <- data.table(date = as.IDate(sample(10000:11000, 10), origin = "1970-01-01")) 
dt[, group := rep(1:2, 5)] 

dt[, min.group.date := as.IDate(NA)] 
dt[, min.group.date := min(date), by = group] 

dt[, class(min.group.date)] 
# [1] "IDate" "Date" 
+1

這[錯誤](https://r-forge.r-project.org/tracker/index.php?func = detail&aid = 2531&group_id = 240&atid = 975)可能是相關的。 –

+0

@statquant - 你說得對。我只是舉報...(是的,標記了我自己的問題) –

+0

感謝您報告這個保羅。現在已經在* v1.8.11 *中修復了。 [見這裏:](http://stackoverflow.com/a/18927785/559784)。 – Arun

回答

1

保羅,如果你想要的是由最小日期組,這條線將做到這一點:

dt[,min(date),by=group] 

你應該看到(如下明顯的日期從你在你的例子不同,因爲「樣本」命令):

group   V1 
1:  1 1997-11-19 
2:  2 1997-12-04 

如果你想看到的每一行,你可以加入表:

setkey(dt,group) #always good practice 
dt_min=dt[,min(date),by=group] 
setnames(dt_min,"V1","min.group.Date") #you should NOT use colnames (see help('setnames') 
dt[dt_min] 


    group  date min.group.Date 
1:  1 1999-01-30  1997-11-19 
2:  1 1999-11-27  1997-11-19 
3:  1 1999-11-11  1997-11-19 
4:  1 1997-11-19  1997-11-19 
5:  1 1999-05-06  1997-11-19 
6:  2 1999-07-11  1997-12-04 
7:  2 1997-12-04  1997-12-04 
8:  2 1998-07-28  1997-12-04 
9:  2 1998-10-23  1997-12-04 
10:  2 1998-06-01  1997-12-04 
+0

嗯,當然,我可以考慮幾種不同的解決方法,但是這將一行優雅的代碼'dt [,min.date:= min(date),by = group]'轉換爲四個。爲什麼通過引用賦值不保留日期類? –

+0

Paul,從colname「min.group.date」中刪除'group'。例如:dt [,min.date:= min(date),by = group],會給你答案。 – hvollmeier

+0

只有當'min.date'被初始化爲'Date'或'IDate'時,這纔是真的。列變量的名稱與它無關。 –