data.frame：通過對行組應用函數來創建列

我有一個數據框架，它包含多個實驗運行的結果，每個實驗都用作日誌，並具有自己的升序計數器。我想另一列添加到具有的iteration用於experiment.num每個不同的值在下面的示例中的最大值的數據幀：data.frame：通過對行組應用函數來創建列

df <- data.frame(
    iteration = rep(1:5,5), 
    experiment.num = c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5)), 
    some.val=42, 
    another.val=12 
)

在這個例子中，額外的列是這樣的（如所有子集有iteration相同的最大）：

df$max <- rep(5,25)

我目前使用的天真的解決方案是：

df$max <- sapply(df$experiment.num,function(exp.num) max(df$iteration[df$experiment.num == exp.num]))

我也使用sapply(unique(df$experiment.num), function(n) c(n,max(df$iteration[df$experiment.num==n])))來構建另一個框架，然後我可以將它與原始框架合併，但這兩種方法似乎都比必要的複雜。

experiment.num列是一個因素，所以我想我可以利用它來避免迭代地爲所有行做這種天真的子集。

有沒有更好的方法來獲得data.frame的子集的最大值列？

來源

2012-06-13 Mathew Hall

使用plyr：

ddply(df, .(experiment.num), transform, max = max(iteration))

來源

2012-06-13 14:50:36 Julius

感謝指向'plyr'的指針，看起來非常有用的包。 –

下面是基礎R的方式：

within(df[order(df$experiment.num), ], 
     max <- rep(tapply(iteration, experiment.num, max), 
        rle(experiment.num)$lengths))

來源

2012-06-13 15:25:16

在基礎R使用ave：

df$i_max <- with(df, ave(iteration, experiment.num, FUN=max))

來源

2012-06-14 04:21:09

我認爲你可以使用data.table：

install.packages("data.table") 
library("data.table") 
dt <- data.table(df) #make your data frame into a data table) 
dt[, pgIndexBY := .BY, by = list(experiment.num)] #this will add a new column to your data table called pgIndexBY

來源

2014-03-08 00:19:31 user2621147

data.frame：通過對行組應用函數來創建列

回答

相關問題