我有一個類似的數據集:如何使用dplyr聚集事件的多個實例,並創建一個整潔tibble
library(tidyverse)
df <- tibble(
subjid = 1:5,
event_1 = c("Watery eyes", # Event number 1
"Sore throat",
"Vomiting",
"Gastroenteritis viral",
"Dry Mouth"),
start_date_1 = as.Date("2017-01-02") + 0:4,
stop_date_1 = as.Date("2017-01-03") + 0:4,
severity_1 = 1,
related_to_drug_1 = 0,
event_2 = c("Nausea", # Event number 2
"Dizziness",
"Cough",
"Disorientation",
"Diarrhea"),
start_date_2 = as.Date("2017-02-02") + 0:4,
stop_date_2 = as.Date("2017-02-03") + 0:4,
severity_2 = 2,
related_to_drug_2 = 1,
event_3 = c("Eczema", # Event number 3
"Sinusitis",
"Abdominal discomfort",
"Muscle spasms",
"Nasopharyngitis"),
start_date_3 = as.Date("2017-03-02") + 0:4,
stop_date_3 = as.Date("2017-03-03") + 0:4,
severity_3 = 2,
related_to_drug_3 = 1
)
df
# A tibble: 5 × 16
subjid event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1 event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2 event_3
<int> <chr> <date> <date> <dbl> <dbl> <chr> <date> <date> <dbl> <dbl> <chr>
1 1 Watery eyes 2017-01-02 2017-01-03 1 0 Nausea 2017-02-02 2017-02-03 2 1 Eczema
2 2 Sore throat 2017-01-03 2017-01-04 1 0 Dizziness 2017-02-03 2017-02-04 2 1 Sinusitis
3 3 Vomiting 2017-01-04 2017-01-05 1 0 Cough 2017-02-04 2017-02-05 2 1 Abdominal discomfort
4 4 Gastroenteritis viral 2017-01-05 2017-01-06 1 0 Disorientation 2017-02-05 2017-02-06 2 1 Muscle spasms
5 5 Dry Mouth 2017-01-06 2017-01-07 1 0 Diarrhea 2017-02-06 2017-02-07 2 1 Nasopharyngitis
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl>
然而,有數據和超過100個「事件」有更多的行/系列的列。數據框由每個主題的一行組成,其中包含不良事件及其相關屬性,列在列中,該列用下劃線命名,以指示它們屬於哪個事件。我想用tidyr來收集這些事件爲tibble像這樣:
# A tibble: 15 × 7
subjid event_number event start_date stop_date severity related_to_drug
<int> <int> <chr> <date> <date> <int> <int>
1 1 1 Watery eyes 2017-01-02 2017-01-03 1 0
2 2 1 Sore throat 2017-01-03 2017-01-04 1 0
3 3 1 Vomiting 2017-01-04 2017-01-05 1 0
4 4 1 Gastroenteritis viral 2017-01-05 2017-01-06 1 0
5 5 1 Dry Mouth 2017-01-06 2017-01-07 1 0
6 1 2 Nausea 2017-02-02 2017-02-03 2 1
7 2 2 Dizziness 2017-02-03 2017-02-04 2 1
8 3 2 Cough 2017-02-04 2017-02-05 2 1
9 4 2 Disorientation 2017-02-05 2017-02-06 2 1
10 5 2 Diarrhea 2017-02-06 2017-02-07 2 1
11 1 3 Eczema 2017-03-02 2017-03-03 3 2
12 2 3 Sinusitis 2017-03-03 2017-03-04 3 2
13 3 3 Abdominal discomfort 2017-03-04 2017-03-05 3 2
14 4 3 Muscle spasms 2017-03-05 2017-03-06 3 2
15 5 3 Nasopharyngitis 2017-03-06 2017-03-07 3 2
這將有一行每一個不良事件和列確定爲特定的事件屬性。
感謝;這正是我需要的!我添加了:%>% mutate(start_date = as_date(as.numeric(start_date)))%>% mutate(stop_date = as_date(as.numeric(stop_date))),它就像我需要的一樣工作!再次感謝!!!! – jsly
@jsly dplyr提示:您可以通過一次調用mutate來做多個更改。例如:'mutate(A = as_date(A),B = as_date(B))'。通過適當的縮進,這可以不那麼凌亂。 (或者更糟) –