2017-10-12 45 views
1

我正在計算一組跨越多個組的水質參數的統計數據。我想在使用sapply函數之前對數據進行分組。在Sapply之前的組數據在R

下面是一個示例data.frame

site <- c("Comm HR", "Comm 1", "Trans HR", "Trans 1", "Comm HR", "Comm 1", 
     "Trans HR", "Trans 1") 
flow <- c(2,21,3,5,2.1,22,.02,.2) 
Pb <- c(200,3,42,3,4.2,55.3, 2,7) 
TN <- c(5,22,1,2,4.5,3.4,2,3.2) 
s <- data.frame(flow,Pb,TN) 

和計算統計數據所需:

stats <- sapply(s, function(s) c("n"=length(s), 
         "Mean"=mean(s,na.rm=TRUE), 
         "Standard Deviation"=sd(s, na.rm=TRUE), 
         "Coefficient of Variation"=sd(s)/mean(s,na.rm=TRUE), 
         "Lower 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)-(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))), 
         "Upper 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)+(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))), 
         "Lower Quantile (25th percentile)"=quantile(s,0.25, na.rm=TRUE), 
         "Median"=median(s), 
         "Upper Quantile (75th percentile)"=quantile(s,0.75, na.rm=TRUE), 
         "Inter Quartile Range"=(quantile(s,0.75, na.rm=TRUE)-quantile(s,0.25, na.rm=TRUE)), 
         "Minimum Detected Value"=min(s), 
         "Maximum Detected Value"=max(s)) 
) 

統計數據,而不是對所有網站一起,我想數據由網站分組,期望輸出如下,但跨4個不同的網站(所以這些統計數據4次):

          flow  Pb   TN 
n          8.0000000 8.000000 8.0000000 
Mean         6.9150000 39.562500 5.3875000 
Standard Deviation      9.1410581 68.022264 6.8436493 
Coefficient of Variation    1.3219173 1.719362 1.2702829 
Lower 95% Confidence Limit about Mean 0.5806863 -7.573658 0.6451801 
Upper 95% Confidence Limit about Mean 13.2493137 86.698658 10.1298199 
+0

dplyr是爲了做這個操作簡單的語法包。檢查一下:https://www.rdocumentation.org/packages/dplyr/versions/0.7.3/topics/summarise –

+2

你可以在獲得data.frames列表之前使用'split',然後通過'lapply'每個列表元素來計算結果。這將返回一個矩陣列表與您的統計數據。 – lmo

+1

@lmo ...爲什麼每個人都忘記'by'? – Parfait

回答

4

考慮by使用網站列作爲子集組。此外,通入sapply所有列後的第一次:

s <- data.frame(site, flow, Pb, TN, stringsAsFactors = FALSE) 

stats_list <- by(s, s$site, FUN=function(df) { 

    sapply(df[2:ncol(df)], function(i) 

    c("n"=length(i), 
     "Mean"=mean(i,na.rm=TRUE), 
     "Standard Deviation"=sd(i, na.rm=TRUE), 
     "Coefficient of Variation"=sd(i)/mean(i,na.rm=TRUE), 
     "Lower 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)-(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))), 
     "Upper 95% Confidence Limit about Mean"=mean(i,na.rm=TRUE)+(qnorm(0.975)*sd(i, na.rm=T)/sqrt(length(i))), 
     "Lower Quantile (25th percentile)"=quantile(i,0.25, na.rm=TRUE), 
     "Median"=median(i), 
     "Upper Quantile (75th percentile)"=quantile(i,0.75, na.rm=TRUE), 
     "Inter Quartile Range"=(quantile(i,0.75, na.rm=TRUE)-quantile(i,0.25, na.rm=TRUE)), 
     "Minimum Detected Value"=min(i), 
     "Maximum Detected Value"=max(i)) 
) 

}) 

輸出(每個站點命名元素的列表)

stats_list 

s$site: Comm 1 
              flow   Pb  TN 
n          2.00000000 2.000000 2.000000 
Mean         21.50000000 29.150000 12.700000 
Standard Deviation      0.70710678 36.981685 13.152186 
Coefficient of Variation    0.03288869 1.268668 1.035605 
Lower 95% Confidence Limit about Mean 20.52001801 -22.103058 -5.527665 
Upper 95% Confidence Limit about Mean 22.47998199 80.403058 30.927665 
Lower Quantile (25th percentile).25% 21.25000000 16.075000 8.050000 
Median        21.50000000 29.150000 12.700000 
Upper Quantile (75th percentile).75% 21.75000000 42.225000 17.350000 
Inter Quartile Range.75%    0.50000000 26.150000 9.300000 
Minimum Detected Value    21.00000000 3.000000 3.400000 
Maximum Detected Value    22.00000000 55.300000 22.000000 
----------------------------------------------------------------------------------------- 
s$site: Comm HR 
              flow   Pb   TN 
n          2.00000000 2.000000 2.00000000 
Mean         2.05000000 102.100000 4.75000000 
Standard Deviation     0.07071068 138.451508 0.35355339 
Coefficient of Variation    0.03449301 1.356038 0.07443229 
Lower 95% Confidence Limit about Mean 1.95200180 -89.780474 4.26000900 
Upper 95% Confidence Limit about Mean 2.14799820 293.980474 5.23999100 
Lower Quantile (25th percentile).25% 2.02500000 53.150000 4.62500000 
Median        2.05000000 102.100000 4.75000000 
Upper Quantile (75th percentile).75% 2.07500000 151.050000 4.87500000 
Inter Quartile Range.75%    0.05000000 97.900000 0.25000000 
Minimum Detected Value    2.00000000 4.200000 4.50000000 
Maximum Detected Value    2.10000000 200.000000 5.00000000 
----------------------------------------------------------------------------------------- 
s$site: Trans 1 
              flow  Pb  TN 
n          2.000000 2.0000000 2.0000000 
Mean         2.600000 5.0000000 2.6000000 
Standard Deviation      3.394113 2.8284271 0.8485281 
Coefficient of Variation    1.305428 0.5656854 0.3263570 
Lower 95% Confidence Limit about Mean -2.103914 1.0800720 1.4240216 
Upper 95% Confidence Limit about Mean 7.303914 8.9199280 3.7759784 
Lower Quantile (25th percentile).25% 1.400000 4.0000000 2.3000000 
Median         2.600000 5.0000000 2.6000000 
Upper Quantile (75th percentile).75% 3.800000 6.0000000 2.9000000 
Inter Quartile Range.75%    2.400000 2.0000000 0.6000000 
Minimum Detected Value     0.200000 3.0000000 2.0000000 
Maximum Detected Value     5.000000 7.0000000 3.2000000 
----------------------------------------------------------------------------------------- 
s$site: Trans HR 
              flow   Pb  TN 
n          2.000000 2.000000 2.0000000 
Mean         1.510000 22.000000 1.5000000 
Standard Deviation      2.107178 28.284271 0.7071068 
Coefficient of Variation    1.395482 1.285649 0.4714045 
Lower 95% Confidence Limit about Mean -1.410346 -17.199280 0.5200180 
Upper 95% Confidence Limit about Mean 4.430346 61.199280 2.4799820 
Lower Quantile (25th percentile).25% 0.765000 12.000000 1.2500000 
Median         1.510000 22.000000 1.5000000 
Upper Quantile (75th percentile).75% 2.255000 32.000000 1.7500000 
Inter Quartile Range.75%    1.490000 20.000000 0.5000000 
Minimum Detected Value     0.020000 2.000000 1.0000000 
Maximum Detected Value     3.000000 42.000000 2.0000000 
+0

太棒了!作品。導出爲ex​​cel的最佳方式是什麼? – kslayerr

+0

由於* stats_list *是一個矩陣列表,簡單地說就是用你的excel導出方法的'lapply'或'for'循環。 – Parfait

+0

明白了:'for(i in seq_along(stats)){ }write.csv(stats [[i]],filename) }謝謝! – kslayerr