如何傳遞列變量來應用函數？

我有這個data.frame：如何傳遞列變量來應用函數？

id | amount1 | amount2 | day1 | day2 
--------------------------------------------- 
    A | 10  | 32 | 0 | 34 
    B | 54  | 44 | 8 | 43 
    C | 45  | 66 | 16 | 99  

df <- data.frame(id=c('A','B','C'), amount1=c(10,54,45), amount2=c(32,44,66), day1=c(0,8,16), day2=c(34,43,99))

上，我想apply功能

df$res <- apply(df, 1, myfunc)

其中

myfunc <- function(x,y) sum(x) * mean(y)

只有我想通過列變量作爲參數到功能，所以它基本上應該讀取

apply(df, 1, myfunc, c(amount1, amount2), c(day1, day2))

第一行，這是

myfunc(c(10,32),c(0,34)) 
# [1] 714

可以這樣做？

來源

2013-01-21 jenswirf

像這樣：

df$res <- apply(df, 1, function(x) myfunc(as.numeric(x[c("amount1", "amount2")]), 
              as.numeric(x[c("day1", "day2")])))

但考慮plyr::adply作爲替代：

library(plyr) 
adply(df, 1, transform, res = myfunc(c(amount1, amount2), c(day1, day2))) 
# id amount1 amount2 day1 day2 res 
# 1 A  10  32 0 34 714.0 
# 2 B  54  44 8 43 2499.0 
# 3 C  45  66 16 99 6382.5

來源

2013-01-21 15:23:47 flodel

這適用於您的示例。也許是同樣的技術可用於真正的問題：

> apply(df[-1], 1, function(x) myfunc(x[1:2], x[3:4])) 
## [1] 714.0 2499.0 6382.5

由於flodel表示，最好是使用的名稱爲子集的操作之一，以確保只有這些列用於申請。必須有子集來防止apply傳遞的向量被轉換爲字符，並且明確指定列意味着數據框中的其他列不會導致此問題。

apply(df[c("amount1", "amount2", "day1", "day2")], 1, 
     function(x) myfunc(x[1:2], x[3:4]) 
    )

在實踐中，我會更容易編寫這樣的事：

amount <- c("amount1", "amount2") 
day <- c("day1", "day2") 

df$res <- apply(df[c(amount, day)], 1, function(x) myfunc(x[amount], x[day]))

來源

2013-01-21 15:13:16

而不是'df [-1]'，更喜歡'df [c（「amount1」，「amount2」，「day1」，「day2」）]]。 – flodel

爲什麼傳遞名稱更好？不容易維護解決方案...這裏更容易刪除一列.. – agstudy

@agstudy：嘗試在1）'df'，2）'cbind（inserted =「hi」，df）上運行所有提供的解決方案，以及3）'cbind（df，append =「ho」）'。良好的代碼應該對輸入的任何這樣的改變是強健的。它需要使用列名而不是列索引。而相反，這樣更容易維護：如果插入或刪除列，則不需要增加/減少索引。 – flodel

data.table解決方案。

require(data.table) 
dt <- data.table(df) # don't depend on `id` column as it may not be unique 
# instead use 1:nrow(dt) in `by` argument 
dt[, res := myfunc(c(amount1,amount2), c(day1, day2)), by=1:nrow(dt)] 
> dt 
# id amount1 amount2 day1 day2 res 
# 1: A  10  32 0 34 714.0 
# 2: B  54  44 8 43 2499.0 
# 3: C  45  66 16 99 6382.5

當你有很多，你會想採取的mean並與amount1和amount2的sum繁殖，然後我會做它以這種方式，在不使用myfuncdays列。但是如果你真的需要一個函數，那麼實現它應該很簡單。

# dummy example 
set.seed(45) 
df <- data.frame(matrix(sample(1:100, 200, replace=T), ncol=10)) 
names(df) <- c(paste0("amount", 1:2), paste0("day", 1:8)) 
df$idx <- 1:nrow(df) # idx column for uniqueness 

# create a data.table 
require(data.table) 
calc_res <- function(df) { 
    dt <- data.table(df) 
    # first get the mean 
    id1 <- setdiff(names(dt), grep("day", names(dt), value=TRUE)) 
    dt[, res := rowMeans(.SD), by=id1] 
    # now product of sum(amounts) and current res 
    id2 <- setdiff(names(dt), names(dt)[1:2]) 
    dt[, res := sum(.SD) * res, by=id2] 
} 
dt.fin <- calc_res(df)

來源

2013-01-21 15:36:08 Arun

你打敗了我這個！ – agstudy

，如果ids不是唯一的，你會怎麼做？ – flodel

我不是故意要你刪除你的評論。確實很重要的一點是，您的解決方案假設ID是獨一無二的;如果不是，它不會給出預期的結果。我只問你是否可以考慮使用data.table的一般答案。 – flodel

如何傳遞列變量來應用函數？

回答

相關問題