2017-03-17 58 views
0

我有一個類似的數據集:如何使用dplyr聚集事件的多個實例,並創建一個整潔tibble

library(tidyverse) 

df <- tibble(
    subjid = 1:5, 
    event_1 = c("Watery eyes",   # Event number 1 
      "Sore throat", 
      "Vomiting", 
      "Gastroenteritis viral", 
      "Dry Mouth"), 
    start_date_1 = as.Date("2017-01-02") + 0:4, 
    stop_date_1 = as.Date("2017-01-03") + 0:4, 
    severity_1 = 1, 
    related_to_drug_1 = 0, 
    event_2 = c("Nausea",    # Event number 2 
      "Dizziness", 
      "Cough", 
      "Disorientation", 
      "Diarrhea"), 
    start_date_2 = as.Date("2017-02-02") + 0:4, 
    stop_date_2 = as.Date("2017-02-03") + 0:4, 
    severity_2 = 2, 
    related_to_drug_2 = 1, 
    event_3 = c("Eczema",    # Event number 3 
      "Sinusitis", 
      "Abdominal discomfort", 
      "Muscle spasms", 
      "Nasopharyngitis"), 
    start_date_3 = as.Date("2017-03-02") + 0:4, 
    stop_date_3 = as.Date("2017-03-03") + 0:4, 
    severity_3 = 2, 
    related_to_drug_3 = 1 
) 
df 

# A tibble: 5 × 16 
    subjid    event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1  event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2    event_3 
    <int>     <chr>  <date>  <date>  <dbl>    <dbl>   <chr>  <date>  <date>  <dbl>    <dbl>    <chr> 
1  1   Watery eyes 2017-01-02 2017-01-03   1     0   Nausea 2017-02-02 2017-02-03   2     1    Eczema 
2  2   Sore throat 2017-01-03 2017-01-04   1     0  Dizziness 2017-02-03 2017-02-04   2     1   Sinusitis 
3  3    Vomiting 2017-01-04 2017-01-05   1     0   Cough 2017-02-04 2017-02-05   2     1 Abdominal discomfort 
4  4 Gastroenteritis viral 2017-01-05 2017-01-06   1     0 Disorientation 2017-02-05 2017-02-06   2     1  Muscle spasms 
5  5    Dry Mouth 2017-01-06 2017-01-07   1     0  Diarrhea 2017-02-06 2017-02-07   2     1  Nasopharyngitis 
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl> 

然而,有數據和超過100個「事件」有更多的行/系列的列。數據框由每個主題的一行組成,其中包含不良事件及其相關屬性,列在列中,該列用下劃線命名,以指示它們屬於哪個事件。我想用tidyr來收集這些事件爲tibble像這樣:

# A tibble: 15 × 7 
    subjid event_number     event start_date stop_date severity related_to_drug 
    <int>  <int>     <chr>  <date>  <date> <int>    <int> 
1  1   1   Watery eyes 2017-01-02 2017-01-03  1     0 
2  2   1   Sore throat 2017-01-03 2017-01-04  1     0 
3  3   1    Vomiting 2017-01-04 2017-01-05  1     0 
4  4   1 Gastroenteritis viral 2017-01-05 2017-01-06  1     0 
5  5   1    Dry Mouth 2017-01-06 2017-01-07  1     0 
6  1   2    Nausea 2017-02-02 2017-02-03  2     1 
7  2   2    Dizziness 2017-02-03 2017-02-04  2     1 
8  3   2     Cough 2017-02-04 2017-02-05  2     1 
9  4   2  Disorientation 2017-02-05 2017-02-06  2     1 
10  5   2    Diarrhea 2017-02-06 2017-02-07  2     1 
11  1   3    Eczema 2017-03-02 2017-03-03  3     2 
12  2   3    Sinusitis 2017-03-03 2017-03-04  3     2 
13  3   3 Abdominal discomfort 2017-03-04 2017-03-05  3     2 
14  4   3   Muscle spasms 2017-03-05 2017-03-06  3     2 
15  5   3  Nasopharyngitis 2017-03-06 2017-03-07  3     2 

這將有一行每一個不良事件和列確定爲特定的事件屬性。

回答

1

您可以用下面的代碼做到這一點:

df %>% 
    gather(Var,Val,-1) %>% 
    mutate(Var = gsub('_(\\d+)','!!\\1',Var)) %>% 
    separate(Var,c('Var','Event'),sep = '!!') %>% 
    spread(Var,Val) 

不幸的是這會破壞類列,這將需要修復,您可以到mutate打電話做。

(另請注意,mutate線聚集後就是因爲你在山坳的名字有「_」,我想打出的事件編號。)

+0

感謝;這正是我需要的!我添加了:%>% mutate(start_date = as_date(as.numeric(start_date)))%>% mutate(stop_date = as_date(as.numeric(stop_date))),它就像我需要的一樣工作!再次感謝!!!! – jsly

+0

@jsly dplyr提示:您可以通過一次調用mutate來做多個更改。例如:'mutate(A = as_date(A),B = as_date(B))'。通過適當的縮進,這可以不那麼凌亂。 (或者更糟) –

1

更令人費解的方式做到這一點,但,相當重要的是,保留類
啓動與列名,根據事件數分割它們,然後使每一個事件數據幀,最後垂直堆疊它們:

names(df) %>% 
    setdiff("subjid") %>% 
    split(sub(".*_(\\d+)$", "\\1", x = .)) %>% 
    map(~ select_(.data = df, .dots = c("subjid", .x))) %>% 
    map(~ setNames(.x, nm = sub("(.*)_\\d+$", "\\1", x = names(.x)))) %>% 
    map2(names(.), ~ mutate(.x, event_number = .y)) %>% 
    bind_rows() %>% 
    select(subjid, event_number, everything()) 
# # A tibble: 15 × 7 
# subjid event_number     event start_date stop_date severity related_to_drug 
#  <int>  <chr>     <chr>  <date>  <date> <dbl>   <dbl> 
# 1  1   1   Watery eyes 2017-01-02 2017-01-03  1    0 
# 2  2   1   Sore throat 2017-01-03 2017-01-04  1    0 
# 3  3   1    Vomiting 2017-01-04 2017-01-05  1    0 
# 4  4   1 Gastroenteritis viral 2017-01-05 2017-01-06  1    0 
# 5  5   1    Dry Mouth 2017-01-06 2017-01-07  1    0 
# 6  1   2    Nausea 2017-02-02 2017-02-03  2    1 
# 7  2   2    Dizziness 2017-02-03 2017-02-04  2    1 
# 8  3   2     Cough 2017-02-04 2017-02-05  2    1 
# 9  4   2  Disorientation 2017-02-05 2017-02-06  2    1 
# 10  5   2    Diarrhea 2017-02-06 2017-02-07  2    1 
# 11  1   3    Eczema 2017-03-02 2017-03-03  2    1 
# 12  2   3    Sinusitis 2017-03-03 2017-03-04  2    1 
# 13  3   3 Abdominal discomfort 2017-03-04 2017-03-05  2    1 
# 14  4   3   Muscle spasms 2017-03-05 2017-03-06  2    1 
# 15  5   3  Nasopharyngitis 2017-03-06 2017-03-07  2    1 
相關問題