2017-10-05 37 views
-1

我有像下面這樣的數據框,我需要對特定列進行分組並創建按列值列的新列表。R-如何分組特定的列值並創建新的列值列表動態

我的數據幀

Domain  Process  Name   value1   value2 

     ML   First  Peter    T1    45 
     ML   First  Peter    FT    34 
     ML   First  Peter    T1    34 
     ML   First  Jhon    LL    11 
     ML   First  Jhon    LL    11 
     ML   Second  Peter    IO    22 
     ML   Second  Peter    IO    33 
     ML   Second  Peter    IO    33 
     ML   four  Peter    IO    33 

我預期的數據幀。

Domain Process  Name  column    listofvalues    

ML   First  Peter   value1    list(info1 = "T1", "Count"="2",list(info2 = "FT", "Count"="1")) 
ML   First  Peter   value2    list(info1 = "45", "Count"="1",list(info2 ="34", "Count"="2")) 
ML   First  Jhon   value1    list(info1 = "LL", "Count"="2") 
ML   First  Jhon   value2    list(info1 = "11", "Count"="2")    
ML   Second  Peter   value1    list(info1 = "IO", "Count"="3") 
ML   Second  Peter   value2    list(info1 = "22", "Count"="1",list(info2 ="33", "Count"="2")) 
ML   four  Peter   value1    list(info1 = "IO", "Count"="1") 
ML 

dput數據。

structure(list(Domain = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = "ML", class = "factor"), Process = structure(c(1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L), .Label = c("First", "four", 
"Second"), class = "factor"), Name = structure(c(2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jhon", "Peter"), class = "factor"), 
    value1 = structure(c(4L, 1L, 4L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c("FT", 
    "IO", "LL", "T1"), class = "factor"), value2 = structure(c(5L, 
    4L, 4L, 1L, 1L, 2L, 3L, 3L, 3L), .Label = c("11", "22", "33", 
    "34", "45"), class = "factor")), .Names = c("Domain", "Process", 
"Name", "value1", "value2"), row.names = c(NA, -9L), class = "data.frame") 

回答

1

您可以使用gathertidyrnest實現自己的目標:

library(tidyr) 
library(dplyr) 

df <- df %>% 
    gather(key, value, -c(Domain, Process, Name)) %>% 
    group_by(Domain, Process, Name, key, value) %>% 
    summarise(count = n()) %>% 
    nest(key, value, count, .key = "listofvalues") 

df 

# # A tibble: 8 x 5 
#  Domain Process Name key  listofvalues 
#  <chr> <chr> <chr> <chr>   <list> 
# 1  ML First Jhon value1 <tibble [1 x 2]> 
# 2  ML First Jhon value2 <tibble [1 x 2]> 
# 3  ML First Peter value1 <tibble [2 x 2]> 
# 4  ML First Peter value2 <tibble [2 x 2]> 
# 5  ML four Peter value1 <tibble [1 x 2]> 
# 6  ML four Peter value2 <tibble [1 x 2]> 
# 7  ML Second Peter value1 <tibble [1 x 2]> 
# 8  ML Second Peter value2 <tibble [2 x 2]> 

df$listofvalues[[3]] 

# # A tibble: 2 x 2 
# value count 
# <chr> <int> 
# 1 FT  1 
# 2 T1  2 

,如果你有決心spread嵌套列,您可以添加

mutate(listofvalues = purrr::map(listofvalues, spread, value, count)) 

到管道鏈,但是,我不建議,除非真的有必要。部分原因是您的數字值會成爲名稱。

df$listofvalues[[4]] 

# # A tibble: 1 x 2 
# `34` `45` 
# * <int> <int> 
# 1  2  1 
+0

,謝謝你的反應,它的工作如我所料。 –

+0

,我還有一個疑問,請幫助我如何計算分數反而算。 –

相關問題