在一個聚合中嵌套一個SUM和一個平均值以獲得每組的分數

我無法找到與我的問題類似的數據集，所以我將數據集Iris（R中的數據集）更改爲看起來相似 - 足夠接近！在一個聚合中嵌套一個SUM和一個平均值以獲得每組的分數

data = iris 
data$type = gl(5,30,150,labels=c("group1","group2","group3","group4","group5")) 
data$ID = gl(30,5,150)

然後我用下面的代碼

xtabs(Sepal.Length ~ Species + type, aggregate(Sepal.Length ~ Species + type + ID, data, mean))

導致

type 
Species  group1 group2 group3 group4 group5 
    setosa  30.16 19.90 0.00 0.00 0.00 
    versicolor 0.00 12.20 35.88 11.28 0.00 
    virginica 0.00 0.00 0.00 26.24 39.64

我的理解是，我的代碼做的是對每個ID一起加入Sepal.Length然後取每種物種和種類的這些值的平均值。

這是正確的嗎？

如果不是，我該如何得到這個？

此外，如果我的數據是這樣的，每個ID有多種類型，我將如何得到這個？（無法弄清楚如何R中構造此）

其實，只是要清透

我要的是，總結在一起Sepal.Length每個ID代碼，然後鍵入它會在所有這些的ID之和的平均值和後的平均Sepal.Length按類型和物種/

來源

2016-09-28 k BORT

隨着data.table：

library(data.table) 
setDT(data) 

#sum of Sepal.Length for each ID AND type 
data[, id_type_sum := sum(Sepal.Length), by = .(ID, type)] 

# mean of this variable by type and species 
data[, mean(id_type_sum), by = .(type, Species)] 

# type Species  V1 
# 1: group1  setosa 25.13333 
# 2: group2  setosa 24.87500 
# 3: group2 versicolor 30.50000 
# 4: group3 versicolor 29.90000 
# 5: group4 versicolor 28.20000 
# 6: group4 virginica 32.80000 
# 7: group5 virginica 33.03333

如果你想在這個表格的形式，你可以使用data.table的dcast方法：

library(magrittr) # for the %>% operator 
data[, mean(id_type_sum), by = .(type, Species)] %>% 
    dcast(Species ~ type)

結果：

 Species group1 group2 group3 group4 group5 
1:  setosa 25.13333 24.875  NA  NA  NA 
2: versicolor  NA 30.500 29.9 28.2  NA 
3: virginica  NA  NA  NA 32.8 33.03333

來源

2016-09-28 21:25:32 arvi1000

我用這個代碼在我的實際數據與數字看起來像我預料到的！非常感謝你，這真是太棒了 –

不客氣！ 'data.table'是一個很好的包 – arvi1000

在一個聚合中嵌套一個SUM和一個平均值以獲得每組的分數

回答

相關問題