自定義功能,我有一個數據幀:本週與data.table
a <- data.frame(BEG_D=as.Date(c("2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-08")) , day=c("Mon","Tues","Wed","Thurs","Fri","Satur","Sun","Mon"), WkNo=c(1,1,1,1,1,1,1,2))
這裏BEG_D代表開始日(以「2014年1月1日」是星期日)。爲了生成其餘的日期編號。我寫了一個自定義函數,並使用與ddply相同:
date_generator <- function(f){
f$seq <- seq(nrow(f))-1
f$date <- as.Date(f$BEG_D + f$seq)
return(f)
}
b <- ddply(a,.(WkNo),date_generator)
也能正常工作的結果作爲新的數據幀,我有:
seq = c(0,1,2,3,4,5,6,0)
date = c("2014-01-01","2014-01-02","2014-01-03","2014-01-04","2014-01-05","2014-01-06","2014-01-07","2014-01-08")
但對我大數據幀需要很長時間。除此之外,還有一些更長時間的ddply操作。所以我決定使用data.table和相同的數據。
date_generator <- function(f){
f[,seq := seq(nrow(f))-1]
f[,.(date = as.Date(BEG_D + seq))]
return(f)
}
a[,date_generator(.SD),by=.(WkNo)]
這樣扔了一個錯誤:
Error in [.data.table(f, , :=(seq, seq(nrow(f)) - 1)) : .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.
什麼是寫與data.table這個自定義函數的正確方法,以及爲什麼ddply對於大型數據幀這麼慢?
我認爲你正在尋找'a [,newdate:= BEG_D + 1:.N-1,by = WkNo]'(我不使用plyr,所以無法比較) – Frank
Thanks @Frank ..對數據表案例工作正常......沒有足夠的信譽來喜歡你的答案,雖然:( – abhiieor