2017-04-05 82 views
0

我有一個數據框,它遵循以下格式。帶R條件語句的彙總滾動平均值

match team1 team2 winningTeam 
1  A  D  A 
2  B  E  E 
3  C  F  C 
4  D  C  C 
5  E  B  B 
6  F  A  A 
7  A  D  D 
8  D  A  A 

我想要做的是打包變量,計算隊伍1和2的形式在最後的x比賽。例如,我想要創建一個名爲team1_form_last3_matches的變量,對於匹配8,它將是0.33(因爲他們贏得了他們最後3場比賽中的1場),並且還會有一個變量叫做team2_form_last3_matches,在比賽8中將是0.66(因爲他們贏了他們最近3場比賽中的2場)。理想情況下,我希望能夠指定在計算團隊時要考慮的以前匹配的數量x _form_last y變量以及要自動創建的變量。我嘗試了一堆使用dplyr,動物園滾動平均函數和嵌套for/if語句的方法。但是,我並沒有完全破解它,當然也不是以一種優雅的方式。我覺得我錯過了這個通用問題的簡單解決方案。任何幫助將非常感激!

乾杯,

傑克

回答

0

這適用於t1l3,您將需要複製它T2。

dat <- data.frame(match = c(1:8), team1 = c("A","B","C","D","E","F","A","D"), team2 = c("D","E","F","C","B","A","D","A"), winningTeam = c("A","E","C","C","B","A","D","A"),stringsAsFactors = FALSE) 

dat$t1l3 <- c(NA,sapply(2:nrow(dat),function(i) { 
    df <- dat[1:(i-1),] #just previous games, i.e. excludes current game 
    df <- df[df$team1==dat$team1[i] | df$team2==dat$team1[i],] #just those containing T1 
    df <- tail(df,3) #just the last three (or fewer if there aren't three previous games) 
    return(sum(df$winningTeam==dat$team1[i])/nrow(df)) #total wins/total games (up to three) 
})) 
+0

嗨。感謝您回覆並回答。我今天在想,這種結構的某些東西可以發揮最佳效果。我嘗試了上述方法,它幾乎可行,但在我的場景中,我想要獲得除當前比賽之外的最後三場比賽的結果 - 我認爲上述內容將包括在內?此外,爲什麼上面不會創建一個團隊發生的前兩次NAs(因爲沒有足夠的數據來計算最後三種形式)。再次感謝! –

+0

嗨,傑克。以上應該排除當前的遊戲 - 也就是'dat [1:(i-1)]'這個詞。 'tail'將給出data.frame(或向量等)的最後部分,直到指定的元素數量。現在你提到它,如果前三場比賽少於三場,那麼除數就不應該是三! - 以上修改。 –

0

如何像:

dat <- data.frame(match = c(1:8), team1 = c("A","B","C","D","E","F","A","D"), team2 = c("D","E","F","C","B","A","D","A"), winningTeam = c("A","E","C","C","B","A","D","A")) 
    match team1 team2 winningTeam 
1  1  A  D   A 
2  2  B  E   E 
3  3  C  F   C 
4  4  D  C   C 
5  5  E  B   B 
6  6  F  A   A 
7  7  A  D   D 
8  8  D  A   A 

Allteams <- c("A","B","C","D","E","F") 

# A vectorized function for you to use to do as you ask: 
teamX_form_lastY <- function(teams, games, dat){ 
    sapply(teams, function(x) { 
    games_info <- rowSums(dat[,c("team1","team2")] == x) + (dat[,"winningTeam"] == x) 
    lookup <- ifelse(rev(games_info[games_info != 0])==2,1,0) 
    games_won <- sum(lookup[1:games]) 
    if(length(lookup) < games) warning(paste("maximum games for team",x,"should be",length(lookup))) 
    games_won/games 
    }) 
} 

teamX_form_lastY("A", 4, dat) 
A 
0.75 

# Has a warning for the number of games you should be using 
teamX_form_lastY("A", 5, dat) 
A 
NA 
Warning message: 
    In FUN(X[[i]], ...) : maximum games for team A should be 4 

# vectorized input 
teamX_form_lastY(teams = c("A","B"), games = 2, dat = dat) 
A B 
0.5 0.5 

# so you ca do all teams 
teamX_form_lastY(teams = Allteams, 2, dat) 
A B C D E F 
0.5 0.5 1.0 0.5 0.5 0.0 
+0

查看上面更新的答案。 –

+0

嗨埃文。謝謝回覆!這也有效,但我更喜歡另一種解決方案,因爲它直接將數據輸出到數據框中。 –

+0

我同意你的意見。乾杯〜 –