2017-03-19 182 views
1

從下面的行創建新列我擁有許多運動員在比賽期間的位置數據。每場比賽最多持續30分鐘。我的數據的一個例子是:基於變量

> df 
     StartValue Athlete Quarter Position 
    1  0.00 Paul  Q1 Bench 
    2  5.35 Paul  Q1 Defender 
    3  19.26 Paul  Q1 Bench 
    4  23.32 Paul  Q1 Defender 
    5  0.00 Paul  Q2 Bench 
    6  9.08 Paul  Q2 Defender 
    7  13.11 Paul  Q2 Defender 
    8  0.00 Paul  Q3 Defender 
    9  7.36 Paul  Q3 Defender 
    10  2.51 Paul  Q3 Bench 
    11  6.44 Paul  Q4 Bench 
    12  22.47 Paul  Q4 Bench 
    13  0.00 Paul  Q4 Defender 
    14  24.38 Paul  Q4 Defender 
    15  11.36 Paul  Q4 Defender 

我現在想創建一個新列df$EndValue這需要以下行的StartValue,並將其放置在同一列。當一個季度的最後一次入場發生時,必須將30放入df$EndValue。例如,前幾排是:

 > df 
      StartValue Athlete Quarter Position EndValue 
     1  0.00 Paul  Q1 Bench 5.35 
     2  5.35 Paul  Q1 Defender 19.26 
     3  19.26 Paul  Q1 Bench 23.32 
     4  23.32 Paul  Q1 Defender 30.00 
     5  0.00 Paul  Q2 Bench 9.08 

我對data.frame預期的輸出將是:

Output <- data.frame(StartValue=c(0, 5.35, 19.26, 23.32, 
           0.00, 9.08, 13.11, 0, 
           2.51, 7.36, 0.0, 6.44, 
           11.36, 22.47, 24.38), 
        EndValue=c(5.35, 19.26, 23.32, 30, 
           9.08, 13.11, 30, 2.51, 
           7.36, 30, 6.44, 11.36, 
           22.47, 24.38, 30), 
        Athlete = c('Paul', 'Paul', 'Paul', 'Paul', 
           'Paul', 'Paul', 'Paul','Paul', 
           'Paul', 'Paul', 'Paul','Paul', 
           'Paul', 'Paul', 'Paul'), 
        Quarter = c('Q1', 'Q1', 'Q1', 'Q1', 
           'Q2', 'Q2', 'Q2', 'Q3', 
           'Q3', 'Q3', 'Q4', 'Q4', 
           'Q4', 'Q4', 'Q4'), 
        Position = c('Bench','Defender','Bench','Defender', 
           'Bench','Defender','Defender','Defender', 
           'Defender','Bench','Bench','Bench', 
           'Defender', 'Defender', 'Defender')) 

我有這30節分鐘一節的許多運動員的數據,所以怎麼能我很快添加這個新專欄?

謝謝。

回答

2

setDT將數據幀轉換爲數據表。按Quarter分組,並將最後一個值指定爲30,並設置EndValue列。

library('data.table') 

編輯:

您的評論,你問改變endValue值具有唯一值每季度進行。首先將StartValue指定爲EndValue,然後查找每個季度中最後一個值的行索引。在下一步中,使用31 for Q1, 32 for Q2, 33 for Q3 and 34 for Q4.更新EndValue

我創建了兩個玩家 - 保羅和鮑勃。他們都有相同的數據,除了他們的名字。

# sample data 
setDT(df) # convert data frame to data table by reference 
df1 <- copy(df) # replicate data by copying df 
df[, Athlete := 'Bob'] # asssign Athlete with Bob player 
df <- rbindlist(l = list(df1, df)) # combine df1 and df 

# sort StartValue by player and quarter 
df <- df[order(StartValue), .SD, by = .(Athlete, Quarter) ] 

# assign start to endvalue and with unique number per player per quarter 
df[, EndValue := StartValue ] # Assign StartValue to EndValue 

# remove 1st, shift values up and assign NA to last 
df[, EndValue := c(EndValue[-1], NA), by = .(Athlete, Quarter)] 

df[ i = df[, .I[.N], by = .(Quarter, Athlete)][, V1], 
    j = EndValue := rep(c(31,32,33,34), 
         length(df[, unique(Athlete) ])) ] 

df 
# Athlete Quarter StartValue Position EndValue 
# 1: Paul  Q1  0.00 Bench  5.35 
# 2: Paul  Q1  5.35 Defender 19.26 
# 3: Paul  Q1  19.26 Bench 23.32 
# 4: Paul  Q1  23.32 Defender 31.00 
# 5: Paul  Q2  0.00 Bench  9.08 
# 6: Paul  Q2  9.08 Defender 13.11 
# 7: Paul  Q2  13.11 Defender 32.00 
# 8: Paul  Q3  0.00 Defender  2.51 
# 9: Paul  Q3  2.51 Bench  7.36 
# 10: Paul  Q3  7.36 Defender 33.00 
# 11: Paul  Q4  0.00 Defender  6.44 
# 12: Paul  Q4  6.44 Bench 11.36 
# 13: Paul  Q4  11.36 Defender 22.47 
# 14: Paul  Q4  22.47 Bench 24.38 
# 15: Paul  Q4  24.38 Defender 34.00 
# 16:  Bob  Q1  0.00 Bench  5.35 
# 17:  Bob  Q1  5.35 Defender 19.26 
# 18:  Bob  Q1  19.26 Bench 23.32 
# 19:  Bob  Q1  23.32 Defender 31.00 
# 20:  Bob  Q2  0.00 Bench  9.08 
# 21:  Bob  Q2  9.08 Defender 13.11 
# 22:  Bob  Q2  13.11 Defender 32.00 
# 23:  Bob  Q3  0.00 Defender  2.51 
# 24:  Bob  Q3  2.51 Bench  7.36 
# 25:  Bob  Q3  7.36 Defender 33.00 
# 26:  Bob  Q4  0.00 Defender  6.44 
# 27:  Bob  Q4  6.44 Bench 11.36 
# 28:  Bob  Q4  11.36 Defender 22.47 
# 29:  Bob  Q4  22.47 Bench 24.38 
# 30:  Bob  Q4  24.38 Defender 34.00 
#  Athlete Quarter StartValue Position EndValue 
+0

如果宿舍的長度各不相同,例如Q1 = 30分鐘和Q2 = 31分鐘,我該如何添加此項?謝謝! – user2716568

+0

當我在更廣泛的數據集上運行這個函數時,df < - setDT(df)[,EndValue:= c(StartValue [1 :(。N-1)],30),by =。(Athlete,Quarter)]返回以下錯誤:'14:在[[.data.table'(setDT(df),,':='(EndValue,c(StartValue [1 :(.N - ...): RHS 1是長度2 (大於組92的大小(1)),最後1個元素將被丟棄。' – user2716568

+0

是的,我有多達45名運動員,但他們的宿舍數量相同,他們的數字不同但是每個季度的排數都是 – user2716568

1

下面是使用dplyr一個解決方案:

library(dplyr) 
quarter_lengths <- c(Q1 = 31, Q2 = 32, Q3 = 30, Q4 = 33) 
df %>% 
    group_by(Athlete, Quarter) %>% 
    mutate(EndValue = c(StartValue[-1], quarter_lengths[Quarter[1]])) 

如果它變得更復雜,例如多個不同長度的遊戲,我會創建一個新的data.frame四分之一長度和inner_join他們。

+0

愛一個'dplyr'解決方案!這工作得很好。 – user2716568