2017-07-10 49 views
1

我有一個數據幀,看起來與此類似:如何將唯一標識符分配給不同列組中的唯一數據幀值集合?

teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 
Jack   Jill   Matt   Megan 
Jill   Jack   Megan   Matt 
Megan   Jill   Matt   Jack 
Megan   Matt   Jill   Jack 
Megan   Jack   Jill   Matt 

我的目標是一個唯一的ID分配給每個獨特的團隊陣容,無論玩家數量的,以及他們是否在A隊或B隊對於上面的例子中,我想以下兩列添加到我的數據幀:

teamAPlayer1 teamAPlayer2 teamAID teamBPlayer1 teamBPlayer2 teamBID 
Jack   Jill   1   Matt   Megan   2 
Jill   Jack   1   Megan   Matt   2 
Megan   Jill   3   Matt   Jack   4 
Megan   Matt   2   Jill   Jack   1 
Jack   Matt   4   Jill   Megan   3 

我可以寫一個解決方案的索引與/ while循環,但我正在一個非常大的數據幀和每隊5人,而不是2人,所以腳本運行需要很長時間。用矢量化的方法解決這個問題有可能嗎?

+0

您已收到以下許多答案。如果其中一人解決了您的問題,請考慮接受它作爲答案。這讓社區知道答案有效,並且應該關閉你的問題。 – CPak

回答

0

你的數據

的獨特的球員的名字
df <- data.frame(teamAPlayer1=c("Jack","Jill","Megan","Megan","Megan"), 
       teamAPlayer2=c("Jill","Jack","Jill","Matt","Jack"), 
       teamBPlayer1=c("Matt","Megan","Matt","Jill","Jill"), 
       teamBPlayer2=c("Megan","Matt","Jack","Jack","Matt"), 
       stringsAsFactors=F) 

製作載體的獨特的球員對

# Grab all unique player names - assign to each a number 
unique.id <- seq(1, length(unique(unlist(df))), 1) 
names(unique.id) <- unique(unlist(df)) 

# Paste and sort player pair combinations in new columns 
df1 <- df %>% 
    rowwise() %>% 
    mutate(teamApairs=paste0(sort(c(unique.id[teamAPlayer1],unique.id[teamAPlayer2])),collapse=" ")) %>% 
    mutate(teamBpairs=paste0(sort(c(unique.id[teamBPlayer1],unique.id[teamBPlayer2])),collapse=" ")) %>% 

製作矢量

# Grab all unique player pairs - assign to each a unique number 
unique.pairs <- seq(1, length(unique(unlist(df1[,5:6]))), 1) 
names(unique.pairs) <- unique(unlist(df1[,5:6])) 

# Factorize unique player pairs as unique number 
df2 <- df1 %>% 
     mutate(teamAID=unique.pairs[teamApairs]) %>% 
     mutate(teamBID=unique.pairs[teamBpairs]) %>% 
     select(-teamApairs,-teamBpairs) 

輸出

teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 teamAID teamBID 
1   Jack   Jill   Matt  Megan  1  3 
2   Jill   Jack  Megan   Matt  1  3 
3  Megan   Jill   Matt   Jack  2  5 
4  Megan   Matt   Jill   Jack  3  1 
5  Megan   Jack   Jill   Matt  4  6 
0

您的輸出不匹配您的輸入(見最後一行),但我認爲這會得到你想要的東西:

df <- read.table(text="teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 
Jack   Jill   Matt   Megan 
Jill   Jack   Megan   Matt 
Megan   Jill   Matt   Jack 
Megan   Matt   Jill   Jack 
Megan   Jack   Jill   Matt",stringsAsFactors=FALSE,header=TRUE) 

dt_concat <- matrix(unlist(t(df)),ncol=2,byrow=TRUE) %>% # create a two column matrix with team compositions 
    cbind(.,team = apply(.,1,. %>% sort %>% paste(collapse=" "))) %>% as.data.table # add column with sorted team members in a string 
dt_concat[, teamID := .GRP, by = team] # attribute ids 
df %<>% cbind(dt_concat$teamID %>% matrix(ncol=2,byrow=TRUE) %>% set_colnames(c("teamAID","teamBID"))) # add ids to original df 

# teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 teamAID teamBID 
# 1   Jack   Jill   Matt  Megan  1  2 
# 2   Jill   Jack  Megan   Matt  1  2 
# 3  Megan   Jill   Matt   Jack  3  4 
# 4  Megan   Matt   Jill   Jack  2  1 
# 5  Megan   Jack   Jill   Matt  5  6 
0

下面是使用pminpmax

v1 <- paste(do.call(pmin, df[c(1:2)]), do.call(pmax, df[c(1:2)])) 
v2 <- paste(do.call(pmin, df[c(3:4)]), do.call(pmax, df[c(3:4)])) 
v3 <- unique(c(rbind(v1, v2))) 

teamAID <- match(v1, v3) 
#[1] 1 1 3 2 5 

teamBID <- match(v2, v3) 
#[1] 2 2 4 1 6 
+0

如果一些夫婦只在一個團隊中,這將不起作用(即嘗試'df < - df [-4,]') –

+0

@Moody_Mudskipper你是對的。編輯包括該案件 – Sotos

0
一個簡單的解決方案

允許我建議您完全重塑原始數據。

library(data.table) 
library(magrittr) 
setDT(df) 

df %>% 
    .[, Round := 1:.N] %>% 
    .[] # this is only here to view the result 

    teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 Round 
1:   Jack   Jill   Matt  Megan  1 
2:   Jill   Jack  Megan   Matt  2 
3:  Megan   Jill   Matt   Jack  3 
4:  Megan   Matt   Jill   Jack  4 
5:  Megan   Jack   Jill   Matt  5 

也就是說,在原始數據中的每一行由Round(輪比賽)的標識。然後,您可以重新整理數據:

df %>% 
    .[, Round := 1:.N] %>% 
    melt.data.table(id.vars = "Round", 
        value.name = "participant") %>% 
    .[, Event := gsub("team([AB]).*$", "\\1", variable)] %>% 
    # Ordering by participant necessary to define 
    # distinct combinations JackJill == JillJack 
    .[order(Round, participant, Event)] %>% 
    .[, 
    .(Team = paste0(participant, collapse = "")), 
    keyby = .(Round, Event)] 

    Round Event  Team 
1:  1  A JackJill 
2:  1  B MattMegan 
3:  2  A JackJill 
4:  2  B MattMegan 
5:  3  A JillMegan 
6:  3  B JackMatt 
7:  4  A MattMegan 
8:  4  B JackJill 
9:  5  A JackMegan 
10:  5  B JillMatt 

此格式有許多優點。例如,您可以添加另一列'Score',它可以明確地指代特定的遊戲,而不是依賴列的順序。但是,如果您想要更接近原件的東西,您可以隨時使用dcast

df %>% 
    .[, Round := 1:.N] %>% 
    melt.data.table(id.vars = "Round", 
        value.name = "participant") %>% 
    .[, Event := gsub("team([AB]).*$", "\\1", variable)] %>% 
    # Ordering by participant necessary to define 
    # distinct combinations JackJill == JillJack 
    .[order(Round, participant, Event)] %>% 
    .[, 
    .(Team = paste0(participant, collapse = "")), 
    keyby = .(Round, Event)] %>% 
    dcast.data.table(Round ~ Event) 

    Round   A   B 
1:  1 JackJill MattMegan 
2:  2 JackJill MattMegan 
3:  3 JillMegan JackMatt 
4:  4 MattMegan JackJill 
5:  5 JackMegan JillMatt 
相關問題