您可以結合使用的LHS中()
以及with = FALSE
引用RHS上的變量。
dt <- data.table(a = 1:5, b = 10:14)
my_add <- function(dt, summand1Name, summand2Name, resultName) {
dt[, (resultName) := dt[, summand1Name, with = FALSE] +
dt[, summand1Name, with = FALSE]]
}
my_add(dt, 'a', 'b', 'c')
dt
編輯:
相比三個版本。我的效率最低......(但將保留僅供參考)。
set.seed(1)
dt <- data.table(a = rnorm(10000), b = rnorm(10000))
original_add <- function(dt, summand1Name, summand2Name, resultName) {
cmd = paste0('dt = dt[, ', resultName, ' := ', summand1Name, ' + ', summand2Name, ']')
eval(parse(text=cmd))
return(dt) # optional since manipulated by reference
}
my_add <- function(dt, summand1Name, summand2Name, resultName) {
dt[, (resultName) := dt[, summand1Name, with = FALSE] +
dt[, summand1Name, with = FALSE]]
}
list_access_add <- function(dt, summand1Name, summand2Name, resultName) {
dt[, (resultName) := dt[[summand1Name]] + dt[[summand2Name]]]
}
david_add <- function(dt, summand1Name, summand2Name, resultName) {
dt[, (resultName) := .SD[[summand1Name]] + .SD[[summand2Name]]]
}
microbenchmark::microbenchmark(
original_add(dt, 'a', 'b', 'c'),
my_add(dt, 'a', 'b', 'c'),
list_access_add(dt, 'a', 'b', 'c'),
david_add(dt, 'a', 'b', 'c'))
## Unit: microseconds
## expr min lq mean median uq max
## original_add(dt, "a", "b", "c") 604.397 659.6395 784.2206 713.0315 776.1295 5070.541
## my_add(dt, "a", "b", "c") 1063.984 1168.6140 1460.5329 1247.7990 1486.9730 6134.959
## list_access_add(dt, "a", "b", "c") 272.822 310.9680 422.6424 334.3110 380.6885 3620.463
## david_add(dt, "a", "b", "c") 389.389 431.9080 542.7955 454.5335 493.4895 3696.992
## neval
## 100
## 100
## 100
## 100
EDIT2:
一個百萬行,結果看起來是這樣的。正如預期的那樣,原來的方法執行得很好,因爲一旦eval
完成,這將工作得很快。
## Unit: milliseconds
## expr min lq mean median uq max
## original_add(dt, "a", "b", "c") 2.493553 3.499039 6.585651 3.607101 4.390051 114.0612
## my_add(dt, "a", "b", "c") 11.821820 14.512878 28.387841 17.412433 19.642231 117.6359
## list_access_add(dt, "a", "b", "c") 2.161276 3.133110 6.874885 3.218185 3.407776 107.6853
## david_add(dt, "a", "b", "c") 2.237089 3.313133 6.047832 3.381757 3.788558 103.7532
## neval
## 100
## 100
## 100
## 100
你或許應該看看'GET'和'mget' –
參見[這](https://stackoverflow.com/questions/27677283/evaluating-both-column-name-and-the-target -d) –
'add = function(dt,summand1Name,summand2Name,resultName)dt [,(resultName):= .SD [[summand1Name]] + .SD [[summand2Name ]]]'?另一個選項可以是'add2 = function(dt,summand1Name,summand2Name,resultName)dt [,(resultName):= eval(as.name(summand1Name))+ eval(as.name(summand2Name))]''得到'如上所述。 –