計算小計（總和，stdev，平均等）

我一直在尋找這一段時間，但一直沒能找到明確的答案。可能一直在尋找錯誤的條件，但也許這裏有人可以快速幫助我。這個問題是基本的。計算小計（總和，stdev，平均等）

的樣本數據集：

set <- structure(list(VarName = structure(c(1L, 5L, 4L, 2L, 3L), 
.Label = c("Apple/Blue/Nice", 
"Apple/Blue/Ugly", "Apple/Pink/Ugly", "Kiwi/Blue/Ugly", "Pear/Blue/Ugly" 
), class = "factor"), Color = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Blue", 
"Pink"), class = "factor"), Qty = c(45L, 34L, 46L, 21L, 38L)), .Names = c("VarName", 
"Color", "Qty"), class = "data.frame", row.names = c(NA, -5L))

這給出了一個數據集，如：

set 


     VarName  Color Qty 
1 Apple/Blue/Nice Blue 45 
2 Pear/Blue/Ugly Blue 34 
3 Kiwi/Blue/Ugly Blue 46 
4 Apple/Blue/Ugly Blue 21 
5 Apple/Pink/Ugly Pink 38

我想這樣做是相當直截了當。我想總結（或平均或stdev）數量列。但是，我也想這樣做在下列條件下相同的操作：

VarName中包含「蘋果」
VarName中包括「醜陋」
顏色等於「藍」

任何人這可以讓我快速介紹如何執行這種計算？

我知道一些它可以通過聚合（）函數來完成，例如：

aggregate(set[3], FUN=sum, by=set[2])[1,2]

不過，我相信有這樣做那麼這更直接的方式。是否有一些過濾器可以添加到像sum()這樣的功能？

來源

2012-09-27 Jochem

這是你在找什麼？

# sum for those including 'Apple' 
apple <- set[grep('Apple', set[, 'VarName']), ] 
aggregate(apple[3], FUN=sum, by=apple[2]) 
    Color Qty 
1 Blue 66 
2 Pink 38 

# sum for those including 'Ugly' 
ugly <- set[grep('Ugly', set[, 'VarName']), ] 
aggregate(ugly[3], FUN=sum, by=ugly[2]) 
    Color Qty 
1 Blue 101 
2 Pink 38 

# sum for Color==Blue 
sum(set[set[, 'Color']=='Blue', 3]) 
[1] 146

的最後一筆可以通過使用subset

sum(subset(set, Color=='Blue')[,3])

來源

2012-09-27 10:09:53

最簡單的方法來拆分VarName列，然後子集變得非常容易。所以，讓我們創建一個對象被varName已經分開：

##There must(?) be a better way than this. Anyone? 
new_set = t(as.data.frame(sapply(as.character(set$VarName), strsplit, "/")))

簡要說明：

我們使用as.character因爲set$VarName是一個因素
sapply依次取每個值和適用strplit
strsplit功能拆分元素
W Ë轉換爲數據幀
移調，以獲得正確的旋轉

接下來，

##Convert to a data frame 
new_set = as.data.frame(new_set) 
##Make nice rownames - not actually needed 
rownames(new_set) = 1:nrow(new_set) 
##Add in the Qty column 
new_set$Qty = set$Qty

這給

R> new_set 
    V1 V2 V3 Qty 
1 Apple Blue Nice 45 
2 Pear Blue Ugly 34 
3 Kiwi Blue Ugly 46 
4 Apple Blue Ugly 21 
5 Apple Pink Ugly 38

現在，所有的操作都作爲標準配置。例如，

##Add up all blue Qtys 
sum(new_set[new_set$V2 == "Blue",]$Qty) 
[1] 146 

##Average of Blue and Ugly Qtys 
mean(new_set[new_set$V2 == "Blue" & new_set$V3 == "Ugly",]$Qty) 
[1] 33.67

，一旦它在正確的形式，你可以用它每次你想要的ddply（及以上）

library(plyr) 
##Split the data frame up by V1 and take the mean of Qty 
ddply(new_set, .(V1), summarise, m = mean(Qty)) 

##Split the data frame up by V1 & V2 and take the mean of Qty 
ddply(new_set, .(V1, V2), summarise, m = mean(Qty))

來源

2012-09-27 10:08:05 csgillespie

很好的解釋+1來完成。 –

謝謝你的解釋。在學習期間，我發現了一些東西。這似乎給了一個NaN答案：「mean（new_set [new_set $ V2 ==」Blue「&& new_set $ V3 ==」Ugly「，] $ Qty）」。不確定爲什麼會發生這種情況。 – Jochem

@Jochem Opps，我有&&'而不是'＆'。 '&&'與媒介不搭配。 – csgillespie

計算小計（總和，stdev，平均等）

回答

相關問題