2014-05-21 98 views
0

我想習慣data.table表示法,並且想清理一下這段代碼。我覺得好像有一個更好,更少的內存貪婪的方式來處理這個問題。我需要計算現有數據框架上的一些基本指標。我可以在不創建多個數據表的情況下做到嗎?另外,如何在denom中使用0處理NaN問題。我想打印0.data.table對象冗餘

library("Lahman") 
library("ggplot2") 
library("data.table") 

DT <- na.omit(data.table(PlayerId = Batting$playerID, SB = Batting$SB, 
       CS = Batting$CS, G = Batting$G)) 

DTa <- (DT[, list(TotalSB = sum(SB), TotalCS = sum(CS), TotalG = sum(G)), 
     by = 'PlayerId']) 

DTb <- (DTa[, 
     list(PlayerId, TotalSB, TotalCS, TotalG, 
      SBAttempts = TotalSB + TotalCS, 
      SBSuccess = TotalSB/(TotalSB + TotalCS), 
      SBPerGame = TotalSB/TotalG) 
      ]) 

print(DTb) 

回答

3

那麼,這是一個稍微更緊湊的方式。

# don't need quotes in `by=...` 
DTa <- (DT[, list(TotalSB = sum(SB), TotalCS = sum(CS), TotalG = sum(G)), 
      by = PlayerId]) 
# use c(...):=list(...) to add multiple columns 
DTa[,c("SBAttempts","SBSuccess","SBPerGame"):= 
     list(TotalSB + TotalCS,TotalSB/(TotalSB + TotalCS),TotalSB/TotalG)] 
# replace NAN with 0 
DTa[,names(DTa)[5:7]:=lapply(.SD,function(x)ifelse(is.nan(x),0,x)),.SDcols=5:7] 

這不會創建一個新的數據表DTa,因爲該表具有比原始表的行較少。額外的列TotalXX被添加,並且NaN被轉換爲0,通過引用(不復制)。