2015-12-30 28 views
1

我有一個看起來像這樣的數據幀:更好的方法將一列分成許多列,然後收集結果?

message.id,sender,recipients 
1,A,B|C 
2,A,B 
3,B,C|D|Q 

我想在recipients欄上的分裂「|」然後收集結果以產生此結果:

message.id,sender,recipient 
1,A,B 
1,A,C 
2,A,B 
3,B,C 
3,B,D 
3,B,Q 

完成此操作的更清晰的方法是什麼?這裏是我當前的代碼:

library(dplyr) 
library(stringr) 
library(tidyr) 

df <- data.frame(message.id = c(1,2,3), 
       sender = c("A","A","B"), 
       recipients = c("B|C","B","C|D|Q")) 

max.splits = df$recipients %>% str_count("\\|") %>% max + 1 

df %>% separate(recipients,1:max.splits, sep = "\\|") %>% 
    gather(trash,recipient,-message.id,-sender) %>% 
    select(message.id, sender, recipient) %>% 
    filter(recipient %>% is.na == FALSE) %>% 
    arrange(message.id) 
+0

'庫(splitstackshape); cSplit(df,「收件人」,「|」,「長」),但我有偏見。 – A5C1D2H2I1M1N2O1R2T1

+0

但是,您可能正在尋找類似'df%>%mutate(recipients = strsplit(as.character(recipients),「\\ |」))%>%unnest(recipients)'.... – A5C1D2H2I1M1N2O1R2T1

回答

1

我們可以使用data.table

library(data.table) 
setDT(df)[, list(recipient=unlist(strsplit(recipients, '[|]'))), 
       .(message.id, sender)] 
1

這個怎麼樣,使用plyr

library(plyr) 
ddply(df, .(message.id), function(d){ 
    cbind(
     sender = as.character(d$sender), 
     recipients = strsplit(as.character(d$recipients), "\\|")[[1]] 
    ) 
}) 
3

我有偏見,但我會建議從我的 「splitstackshape」 包cSplit

用法,簡直是:

library(splitstackshape) 
cSplit(df, "recipients", "|", "long") 
# message.id sender recipients 
# 1:   1  A   B 
# 2:   1  A   C 
# 3:   2  A   B 
# 4:   3  B   C 
# 5:   3  B   D 
# 6:   3  B   Q 

或者,使用的 「dplyr」 相結合的管道和 「tidyr」 爲unnest,然後你可以嘗試:

library(dplyr) 
library(tidyr) 
df %>% 
    mutate(recipients = as.character(recipients)) %>%   ## need character for strsplit 
    mutate(recipients = strsplit(recipients, "|", TRUE)) %>% ## Use `fixed = TRUE` 
    unnest(recipients)          ## `unnest` goes to long form 
# Source: local data frame [6 x 3] 
# 
# message.id sender recipients 
#  (dbl) (fctr)  (chr) 
# 1   1  A   B 
# 2   1  A   C 
# 3   2  A   B 
# 4   3  B   C 
# 5   3  B   D 
# 6   3  B   Q 
1

這裏是一個溶液使用dplyrtidyr

df <- data.frame(message.id = 1:3, sender = c("A","A","B"), 
recipients = c("B|C","B","C|D|Q")) 

原始數據

message.id sender recipients 
1   1  A  B|C 
2   2  A   B 
3   3  B  C|D|Q 

代碼

df %>% separate(recipients,into =c("r1","r2","r3")) %>% 
gather("sen","recipient",r1:r3) %>% select(-sen) %>% 
filter(!is.na(recipient)) 

結果

message.id sender recipient 
1   1  A   B 
2   2  A   B 
3   3  B   C 
4   1  A   C 
5   3  B   D 
6   3  B   Q 
相關問題