2017-10-04 79 views
0

的樣本數據如何R中立刻刪除重複值的多個列的單個列

  sessionid    qf  Office 
       12    3  LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3 
       12    4  DEL2,DEL1,LON1,DEL1 
       13    5  MAn1,LON1,DEL1,LON1 

在這裏,我想每一行刪除重複值的列「OFFICE」。

期望輸出

  sessionid    qf  Office 
       12    3  LON1,LON2,SEA2,SEA3 
       12    4  DEL2,DEL1,LON1 
       13    5  MAN1,LON1,DEL1 

回答

2

我們可以使用tidyverse。由deimiter拆分「辦公室」,擴大到「長」格式,然後拿到distinct行,用「的SessionID」分組,「QF」,「辦公室」的paste內容

library(tidyverse) 
separate_rows(df1, Office) %>% 
     distinct() %>% 
    group_by(sessionid, qf) %>% 
    summarise(Office = toString(Office)) 
# A tibble: 3 x 3 
# Groups: sessionid [?] 
# sessionid qf     Office 
#  <int> <int>     <chr> 
#1  12  3 LON1, LON2, SEA2, SEA3 
#2  12  4  DEL2, DEL1, LON1 
#3  13  5  MAn1, LON1, DEL1 
2

這裏是一個這樣做的基礎R方式,它可以作爲你所期望的,先拆辦公室由逗號,刪除重複值,然後粘貼再聚首

df$Office <- sapply(lapply(strsplit(df$Office, ","), 
          function(x) { 
          unique(x) 
          }), 
        function(x) { 
         paste(x, collapse = ",") 
        }, 
        simplify = T) 

%>%

df$Office <- df$Office %>% 
    strsplit(",") %>% 
    lapply(function(x){unique(x)}) %>% 
    sapply(function(x){paste(x,collapse = ",")},simplify = T)