將函數應用於一組一組的（在ddply tapply）

我的數據集是這樣的：將函數應用於一組一組的（在ddply tapply）

d = data.frame(year=rep(2000:2002,each=40),month=rep(c(rep(1:12,3),5,6,7,8),3),species=rep(c(rep(letters[1:12],3),"a","b","g","l"),3),species_group=NA,kg=round(rnorm(120,15,6),digits=2)) 
d$species_group=ifelse(d$species %in% letters[1:5],"A","B")

我想每年每個種類組（所以不包括一個月和物種的水平有）包括的物種的平均重量和數量。這與ddply正常工作。不過，我也希望包含我的數據的「質量」值。也就是說，如果每個月的物種數量是平衡的，或者例如在夏季的月份中包括更多的物種。因此我認爲我可以簡單地計算每月獨特物種數量的年度標準偏差。我試着在ddply如下與tapply這樣做：

s=ddply(d,c("year","species_group"),function(x) cbind(n_species=length(unique(x$species)), 
                quality=tapply(x,x$month,sd(length(unique(x$species)))), 
                kg=sum(x$kg,na.rm=T)))

，但是這給了我一個錯誤

Error in match.fun(FUN) : 'sd(length(unique(x$species)))' is not a function, character or symbol

我想什麼來獲得是這樣的：

output=data.frame(year=rep(2000:2002,each=2),species_group=rep(c("A","B"),3),n_species=rep(c(7,9),3),quality=round(rnorm(6,2,0.3),digits=2),kg=round(rnorm(6,15,6),digits=2))

我不能首先在月份，年份和物種組中使用ddply，因爲這意味着我無法再瞭解每年獨特物種的數量。我想我也可以分別計算n_species和質量，然後把它們放在一起，但這將是一個麻煩的方法。

如何讓我的功能起作用，或者如何更正確地做到這一點？

答：

最簡單的解決方案，從影子，誰在使用tapply的指出我的錯誤來了。此外，標準誤差應比標準偏差更合適，給下面的公式：

s=ddply(d,c("year","species_group"),function(x) cbind(n_species=length(unique(x$species)), 
                quality=sd(tapply(x$species,x$month, function(y) length(unique(y))))/sqrt(length(tapply(x$species,x$month, function(y) length(unique(y))))), 
                kg=sum(x$kg,na.rm=T)))

來源

2014-02-27 Wave

如果我理解正確，那麼您只是錯誤地使用了'tapply'。試試'sd（tapply（x $ species，x $ month，function（y）length（unique（y））））''。 – shadow

不清楚如何定義你的質量標準。所以我會如何做到這一點。首先我在一個單獨的函數中定義我的質量標準。請注意，您的函數應該重新調用單個值而不是矢量（在您使用的解決方案中，您使用的是返回矢量的tapply）。

## returns the mean of sd variation per month 
get_quality <- 
    function(species,month) 
    mean(tapply(species,month, 
       FUN=function(s)sd(as.integer(s))), 
    na.rm=TRUE)

然後我在ddply內使用它。爲了簡化代碼，我還創建了一個按組應用的函數。

ff <- 
function(x) { 
    cbind(n_species=length(unique(x$species)), 
     quality= get_quality(x$species,x$month), 
     kg=sum(x$kg,na.rm=TRUE)) 
} 
library(plyr) 

s=ddply(d,.(year,species_group),ff) 


    year species_group n_species quality  kg 
1 2000    A   5 0.4000000 259.68 
2 2000    B   7 0.2857143 318.24 
3 2001    A   5 0.4000000 285.07 
4 2001    B   7 0.2857143 351.54 
5 2002    A   5 0.4000000 272.46 
6 2002    B   7 0.2857143 331.45

來源

2014-02-27 10:26:07 agstudy

將函數應用於一組一組的（在ddply tapply）

回答

相關問題