如何將R中數據框中最短日期和最大日期之間的採樣日期作爲附加列返回?返回數據框中R中最短日期和最長日期之間的採樣日期
Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017
如何將R中數據框中最短日期和最大日期之間的採樣日期作爲附加列返回?返回數據框中R中最短日期和最長日期之間的採樣日期
Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017
假設你在一個數據幀名爲MYDATA的工作,你可以使用下面的代碼片段:
mydata$sampledate <- sample(seq(as.Date(mydata$MinEnrollmentDate), as.Date(mydata$MinEnrollmentDate), by="day"), 1)
基本上,這樣做是首先生成的開始和之間的所有天序列結束日期,然後從該序列中隨機抽取1號樣本,並將其寫入您的數據框。
使用dplyr
,我們可以這樣做:如果
library(dplyr)
df <- df %>%
rowwise() %>%
mutate(MinEnrollmentDate = as.Date(MinEnrollmentDate, format = '%m/%d/%Y'),
MaxEnrollmentDate = as.Date(MaxEnrollmentDate, format = '%m/%d/%Y'),
sampleDate = sample(seq(MinEnrollmentDate, MaxEnrollmentDate, '-1 day'), 1))
df
#> Source: local data frame [5 x 4]
#> Groups: <by row>
#>
#> # A tibble: 5 x 4
#> Course MinEnrollmentDate MaxEnrollmentDate sampleDate
#> <chr> <date> <date> <date>
#> 1 Maths 2016-03-11 2016-03-04 2016-03-08
#> 2 Chemistry 2016-06-11 2016-06-04 2016-06-09
#> 3 Physics 2016-09-11 2016-09-04 2016-09-06
#> 4 English 2016-12-11 2016-12-04 2016-12-09
#> 5 Science 2017-03-11 2017-03-04 2017-03-06
不知道我得到了你的日期格式正確,它的曖昧,隨時糾正format=
部分。 數據:
df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F)
你可以計算天的兩個日期之間的數字:
days <- as.Date(data$MinEnrollmentDate, format="%d/%m/%Y") - as.Date(data$MaxEnrollmentDate, format="%d/%m/%Y")
,然後添加到MinEnrollmentDate
1天到MaxEnrollmentDate
與功能的數量之間的隨機數sample()
:
for(i in seq_along(days)) {
data[i,4] <- as.character(as.Date(data$MinEnrollmentDate, format="%d/%m/%Y")[i] + sample(1:days[i],1))
}
一步一步lubridate
溶液,爲完整起見(使用GGamba的df):
if (!require(lubridate)) {
install.packages("lubridate")
}
df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F)
no_days <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") - as.POSIXct(df$MaxEnrollmentDate, format = "%d/%m/%Y")
random_days <- sapply(no_days, function(x) sample(x = 1:x, size = 1, replace = T))
df$random_date <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") + days(random_days)
我認爲'MinEnrollmentDate'和'MaxEnrollmentDate'的列名已被互換。理想情況下,'MaxEnrollmentDate'必須> ='MinEnrollmentDate' – Aramis7d