2013-02-06 26 views
28

我已經寫了(相當幼稚)函數來隨機地選擇兩個規定的天有效地產生兩個日期

# set start and end dates to sample between 
day.start <- "2012/01/01" 
day.end <- "2012/12/31" 

# define a random date/time selection function 
rand.day.time <- function(day.start,day.end,size) { 
    dayseq <- seq.Date(as.Date(day.start),as.Date(day.end),by="day") 
    dayselect <- sample(dayseq,size,replace=TRUE) 
    hourselect <- sample(1:24,size,replace=TRUE) 
    minselect <- sample(0:59,size,replace=TRUE) 
    as.POSIXlt(paste(dayselect, hourselect,":",minselect,sep="")) 
} 

這導致之間的日期/時間之間的時間和日期的隨機樣本:

> rand.day.time(day.start,day.end,size=3) 
[1] "2012-02-07 21:42:00" "2012-09-02 07:27:00" "2012-06-15 01:13:00" 

但是隨着樣本量的增加,這似乎顯着減慢。

# some benchmarking 
> system.time(rand.day.time(day.start,day.end,size=100000)) 
    user system elapsed 
    4.68 0.03 4.70 
> system.time(rand.day.time(day.start,day.end,size=200000)) 
    user system elapsed 
    9.42 0.06 9.49 

是否有人能夠建議如何以更有效的方式做這樣的事情?

回答

39

啊,另一個日期/時間問題,我們可以減少在彩車:)工作

試試這個功能

R> latemail <- function(N, st="2012/01/01", et="2012/12/31") { 
+  st <- as.POSIXct(as.Date(st)) 
+  et <- as.POSIXct(as.Date(et)) 
+  dt <- as.numeric(difftime(et,st,unit="sec")) 
+  ev <- sort(runif(N, 0, dt)) 
+  rt <- st + ev 
+ } 
R> 

我們計算difftime在幾秒鐘內,然後「僅僅」在它上面繪製制服,對結果進行排序。添加到開始,你就完成了:

R> set.seed(42); print(latemail(5))  ## round to date, or hour, or ... 
[1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT" 
[3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST" 
[5] "2012-12-07 18:46:50.233761 CST" 
R> system.time(latemail(100000)) 
    user system elapsed 
    0.024 0.000 0.021 
R> system.time(latemail(200000)) 
    user system elapsed 
    0.044 0.000 0.045 
R> system.time(latemail(10000000)) ## a few more than in your example :) 
    user system elapsed 
    3.240 0.172 3.428 
R> 
+0

乾杯 - 作品一種享受,而且速度很快。 – thelatemail

+10

使用日期和時間的第一條規則:* always *請記住'POSIXct'實際上只是一個數字,自從點以來有小數秒。 Dito提供'日期'和分數日。很多問題變得很容易。 –

+4

這個答案的天才是'st + ev'技巧 - 這是'POSIXct'的往返過程,這很痛苦,因爲你需要明確指定原點。否則'runif(N,as.POSIXct(st),as.POSIXct(et))'會讓你獲得90%的分數。但是你需要'as.POSIXct(...,origin =「1970-01-01」)' – user295691

2

這樣的事情也會起作用。對不起,隨機數據框,我只是扔在那裏,所以你可以看到一個情節。

data=as.data.frame(list(ID=1:10, 
        variable=rnorm(10,50,10))) 

#This function will generate a uniform sample of dates from 
#within a designated start and end date: 

rand.date=function(start.day,end.day,data){ 
    size=dim(data)[1]  
    days=seq.Date(as.Date(start.day),as.Date(end.day),by="day") 
    pick.day=runif(size,1,length(days)) 
    date=days[pick.day] 
} 

#This will create a new column within your data frame called date: 

data$date=rand.date("2014-01-01","2014-02-28",data) 

#and this will order your data frame by date: 

data=data[order(data$date),] 

#Finally, you can see how the data looks 

plot(data$date,data$variable,type="b")