2014-02-23 33 views
0

我試圖將子集從data.frame轉換爲data.table以改善我的代碼的性能。但我對data.table完全陌生。 data.table此子集表述類型的等效項目是什麼?將聯合類型子集從data.frame轉換爲data.table

for(ii in 1:nplayer) 
    { 
    subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii]) 
    players[ii,4]<-nrow(subgame) 
    } 

我已經定義了這樣一個新的data.tablegameDT dput的

gameDT<-data.table(game) 
    setkey(gameDT,playerA,playerB) 

輸出

>dput(game[1:2,]) 
    structure(list(country = c("New Zealand", "Australia"), tournament = c("WTA Auckland 2012", 
    "WTA Brisbane 2012"), date = c("2011-12-31 00:00:00", "2011-12-30 00:15:00" 
    ), playerA = c("Schoofs B.", "Lucic M."), playerB = c("Puig M.", 
    "Tsurenko L."), resultA = c(1L, 1L), resultB = c(2L, 2L), oddA = c("1.8", 
    "2.17"), oddB = c("1.9", "1.57"), N = c(4L, 3L), Weight = c(1, 
    0.973608997871031)), .Names = c("country", "tournament", "date", 
    "playerA", "playerB", "resultA", "resultB", "oddA", "oddB", "N", 
    "Weight"), row.names = 1:2, class = "data.frame") 
+3

你能dput數據集或它的子集(例如dput(遊戲[1:20,]))? –

+0

'data.table'中的子集語法就是'dt [playerA ==「a」| playerB ==「a」]' –

回答

1

你可以考慮使用lapply如果這不只是一個學習鍛鍊data.table

I想到下面的例子相當於你正在嘗試做的,你看到的,通過使用lapply一個相當不錯的加速:

set.seed(123) 
library(microbenchmark) 

game = data.frame(runif(1:50) , playerA = sample(letters[1:5], 50, replace = T), playerB = sample(letters[1:5], 50, replace = T)) 

player <- union(game$playerA, game$playerB) 
nplayer <- length(player) 
players <- matrix(player, nrow = nplayer, ncol = 2) 

op <- microbenchmark(
    LAPPLY = {counts <- lapply(1:nplayer, 
          function(i) sum(game$playerA == player[i] | game$playerB == player[i])) 
      names(counts) <- player }, 
    ORIG = { 
     for(ii in 1:nplayer) 
     { 
      subgame<-subset(game, game$playerA == player[ii] | game$playerB == player[ii]) 
      players[ii,2]<-nrow(subgame) 
     }}, 
    times = 1000) 

op 

#Unit: microseconds 
# expr  min  lq median  uq  max neval 
# LAPPLY 236.493 251.9985 259.095 269.3205 8323.701 1000 
# ORIG 938.194 981.9060 1002.880 1036.6705 61095.935 1000 

unlist(counts) 

# a c d b e 
#19 17 20 20 15 

players 

#  [,1] [,2] 
#[1,] "a" "19" 
#[2,] "c" "17" 
#[3,] "d" "20" 
#[4,] "b" "20" 
#[5,] "e" "15" 
+0

Thanx,但我正在學習data.table。 – emanuele

+0

你能否更好地解釋我這句話的含義:'names(counts)< - player' – emanuele

+0

它將list中的元素命名爲'counts'。在這種情況下,與'names(counts)< - c(「a」,「b」,「c」,「d」,「e」)「相同。 –

相關問題