2016-10-26 248 views
0

我有兩個數據幀包含相關數據。這與NFL有關。一個DF有按周球員的名字和接收目標(玩家DF):R:如何從兩個其他數據幀創建一個新的數據幀

  Player Tm Position 1 2 3 4 5 6 
1  A.J. Green CIN  WR 13 8 11 12 8 10 
2 Aaron Burbridge SFO  WR 0 1 0 2 0 0 
3 Aaron Ripkowski GNB  RB 0 0 0 0 0 1 
4 Adam Humphries TAM  WR 5 8 12 4 2 0 
5 Adam Thielen MIN  WR 5 5 4 3 8 0 
6 Adrian Peterson MIN  RB 2 3 0 0 0 0 

其他數據幀recieving通過團隊總結目標,每星期(團隊DF):

 Tm `1` `2` `3` `4` `5` `6` 
    <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
1  ARI 37 35 50 45 26 35 
2  ATL 38 34 30 37 28 41 
3  BAL 32 45 40 51 47 48 
4  BUF 22 30 20 33 20 26 
5  CAR 31 39 36 47 28 46 
6  CHI 28 29 45 36 41 49 
7  CIN 30 54 28 31 39 31 
8  CLE 26 33 38 38 35 42 
9  DAL 43 30 24 32 24 27 
10 DEN 26 32 35 31 34 47 
# ... with 22 more rows 

我是什麼試圖做的是按星期創建另一個包含玩家目標百分比的數據框。所以我需要匹配球員df中的「Tm」列和周列標題(1-6)中的球隊。

我已經找到了如何通過將它們合併,然後創建新行要做到這一點,但我添加更多的數據(周)我需要編寫更多的代碼:

a <- merge(playertgt, teamtgt, by="Tm") #merges the two 
    a$Wk1 <- a$`1.x`/a$`1.y` 
    a$Wk2 <- a$`2.x`/a$`2.y` 
    a$Wk3 <- a$`3.x`/a$`3.y` 

所以我要尋找是一個很好的方法來做到這一點,將自動更新,並不需要創建一個df與我不需要的一堆列,並將更新與新周,因爲我將它們添加到我的源數據。

如果在其他地方回答這個問題,我很抱歉,但我一直在尋找一種很好的方法來做到這一點,我找不到它。在此先感謝您的幫助!很顯然,我只是在完成合並後選擇列使用dplyrends_with方便

library(dplyr) 
## Do a left outer join to match each player with total team targets 
a <- left_join(playertgt,teamtgt, by="Tm") 
## Compute percentage over all weeks selecting player columns ending with ".x" 
## and dividing by corresponding team columns ending with ".y" 
tgt.pct <- select(a,ends_with(".x"))/select(a,ends_with(".y")) 
## set the column names to week + number 
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1)) 
## construct the output data frame adding back the player and team columns 
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct) 

回答

2

你可以用dplyr做到這一點。使用grepl做這個選擇的基-R的做法是:

a <- merge(playertgt, teamtgt, by="Tm", all.x=TRUE) 
tgt.pct <- subset(a,select=grepl(".x$",colnames(a)))/subset(a,select=grepl(".y$",colnames(a))) 
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1)) 
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct) 

數據:用有限的發佈數據中,只有AJ格林將有他的目標百分比計算:

playertgt <- structure(list(Player = structure(1:6, .Label = c("A.J. Green", 
"Aaron Burbridge", "Aaron Ripkowski", "Adam Humphries", "Adam Thielen", 
"Adrian Peterson"), class = "factor"), Tm = structure(c(1L, 4L, 
2L, 5L, 3L, 3L), .Label = c("CIN", "GNB", "MIN", "SFO", "TAM" 
), class = "factor"), Position = structure(c(2L, 2L, 1L, 2L, 
2L, 1L), .Label = c("RB", "WR"), class = "factor"), X1 = c(13L, 
0L, 0L, 5L, 5L, 2L), X2 = c(8L, 1L, 0L, 8L, 5L, 3L), X3 = c(11L, 
0L, 0L, 12L, 4L, 0L), X4 = c(12L, 2L, 0L, 4L, 3L, 0L), X5 = c(8L, 
0L, 0L, 2L, 8L, 0L), X6 = c(10L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Player", 
"Tm", "Position", "X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA, 
-6L)) 
##   Player Tm Position X1 X2 X3 X4 X5 X6 
##1  A.J. Green CIN  WR 13 8 11 12 8 10 
##2 Aaron Burbridge SFO  WR 0 1 0 2 0 0 
##3 Aaron Ripkowski GNB  RB 0 0 0 0 0 1 
##4 Adam Humphries TAM  WR 5 8 12 4 2 0 
##5 Adam Thielen MIN  WR 5 5 4 3 8 0 
##6 Adrian Peterson MIN  RB 2 3 0 0 0 0 

teamtgt <- structure(list(Tm = structure(1:10, .Label = c("ARI", "ATL", 
"BAL", "BUF", "CAR", "CHI", "CIN", "CLE", "DAL", "DEN"), class = "factor"), 
    X1 = c(37L, 38L, 32L, 22L, 31L, 28L, 30L, 26L, 43L, 26L), 
    X2 = c(35L, 34L, 45L, 30L, 39L, 29L, 54L, 33L, 30L, 32L), 
    X3 = c(50L, 30L, 40L, 20L, 36L, 45L, 28L, 38L, 24L, 35L), 
    X4 = c(45L, 37L, 51L, 33L, 47L, 36L, 31L, 38L, 32L, 31L), 
    X5 = c(26L, 28L, 47L, 20L, 28L, 41L, 39L, 35L, 24L, 34L), 
    X6 = c(35L, 41L, 48L, 26L, 46L, 49L, 31L, 42L, 27L, 47L)), .Names = c("Tm", 
"X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA, 
-10L)) 
## Tm X1 X2 X3 X4 X5 X6 
##1 ARI 37 35 50 45 26 35 
##2 ATL 38 34 30 37 28 41 
##3 BAL 32 45 40 51 47 48 
##4 BUF 22 30 20 33 20 26 
##5 CAR 31 39 36 47 28 46 
##6 CHI 28 29 45 36 41 49 
##7 CIN 30 54 28 31 39 31 
##8 CLE 26 33 38 38 35 42 
##9 DAL 43 30 24 32 24 27 
##10 DEN 26 32 35 31 34 47 

結果:

##   Player Tm  week1  week2  week3  week4  week5  week6 
##1  A.J. Green CIN 0.4333333 0.1481481 0.3928571 0.3870968 0.2051282 0.3225806 
##2 Aaron Burbridge SFO  NA  NA  NA  NA  NA  NA 
##3 Aaron Ripkowski GNB  NA  NA  NA  NA  NA  NA 
##4 Adam Humphries TAM  NA  NA  NA  NA  NA  NA 
##5 Adam Thielen MIN  NA  NA  NA  NA  NA  NA 
##6 Adrian Peterson MIN  NA  NA  NA  NA  NA  NA 
2

如果你下次提供一些數據,這會讓你的生活變得輕鬆很多,那將會很不錯。

我認爲重點是你的數據結構。我認爲你必須把你的數據轉換成長格式(關鍵字是我想的整潔數據)。我編寫了一些數據,希望我能正確理解你的問題。

library(tidyr) 
library(dplyr) 


player_df = data.frame(team = c('ARI', 'BAL', 'BAL', 'CLE', 'CLE'), 
         player =c('A', 'B', 'C', 'D', 'F'), 
         '1' = floor(runif(5, min=1, max=2)*10), 
         '2' = floor(runif(5, min=1, max=2)*10)) 
> player_df 
    team player X1 X2 
1 ARI  A 15 10 
2 BAL  B 16 15 
3 BAL  C 13 11 
4 CLE  D 14 19 
5 CLE  F 12 14 

team_df = data.frame(team = c('ARI', 'BAL', 'CLE'), 
         '1' = floor(runif(3, min=10, max=20)*20), 
         '2' = floor(runif(3, min=10, max=20)*20)) 
> team_df 
    team X1 X2 
1 ARI 281 205 
2 BAL 362 309 
3 CLE 323 238 

現在,把兩者dataframes爲長格式:

player_df = gather(player_df, week, player_value, -team, -player) 
team_df = gather(team_df, week, team_value, -team) 

> player_df 
    team player week player_value 
1 ARI  A X1   15 
2 BAL  B X1   16 
3 BAL  C X1   13 
4 CLE  D X1   14 
5 CLE  F X1   12 
6 ARI  A X2   10 
7 BAL  B X2   15 
8 BAL  C X2   11 
9 CLE  D X2   19 
10 CLE  F X2   14 
> team_df 
    team week team_value 
1 ARI X1  281 
2 BAL X1  362 
3 CLE X1  323 
4 ARI X2  205 
5 BAL X2  309 
6 CLE X2  238 

現在,加入(或合併)在一起。默認情況下,inner_join將加入公共列名稱。

join_db = inner_join(player_df, team_df) 
> join_db 
    team player week player_value team_value 
1 ARI  A X1   15  281 
2 BAL  B X1   16  362 
3 BAL  C X1   13  362 
4 CLE  D X1   14  323 
5 CLE  F X1   12  323 
6 ARI  A X2   10  205 
7 BAL  B X2   15  309 
8 BAL  C X2   11  309 
9 CLE  D X2   19  238 
10 CLE  F X2   14  238 

我認爲在這種格式下你可以做更多的事情。

HTH

斯特凡

相關問題