我正在計算一組跨越多個組的水質參數的統計數據。我想在使用sapply
函數之前對數據進行分組。在Sapply之前的組數據在R
下面是一個示例data.frame
:
site <- c("Comm HR", "Comm 1", "Trans HR", "Trans 1", "Comm HR", "Comm 1",
"Trans HR", "Trans 1")
flow <- c(2,21,3,5,2.1,22,.02,.2)
Pb <- c(200,3,42,3,4.2,55.3, 2,7)
TN <- c(5,22,1,2,4.5,3.4,2,3.2)
s <- data.frame(flow,Pb,TN)
和計算統計數據所需:
stats <- sapply(s, function(s) c("n"=length(s),
"Mean"=mean(s,na.rm=TRUE),
"Standard Deviation"=sd(s, na.rm=TRUE),
"Coefficient of Variation"=sd(s)/mean(s,na.rm=TRUE),
"Lower 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)-(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
"Upper 95% Confidence Limit about Mean"=mean(s,na.rm=TRUE)+(qnorm(0.975)*sd(s, na.rm=T)/sqrt(length(s))),
"Lower Quantile (25th percentile)"=quantile(s,0.25, na.rm=TRUE),
"Median"=median(s),
"Upper Quantile (75th percentile)"=quantile(s,0.75, na.rm=TRUE),
"Inter Quartile Range"=(quantile(s,0.75, na.rm=TRUE)-quantile(s,0.25, na.rm=TRUE)),
"Minimum Detected Value"=min(s),
"Maximum Detected Value"=max(s))
)
統計數據,而不是對所有網站一起,我想數據由網站分組,期望輸出如下,但跨4個不同的網站(所以這些統計數據4次):
flow Pb TN
n 8.0000000 8.000000 8.0000000
Mean 6.9150000 39.562500 5.3875000
Standard Deviation 9.1410581 68.022264 6.8436493
Coefficient of Variation 1.3219173 1.719362 1.2702829
Lower 95% Confidence Limit about Mean 0.5806863 -7.573658 0.6451801
Upper 95% Confidence Limit about Mean 13.2493137 86.698658 10.1298199
dplyr是爲了做這個操作簡單的語法包。檢查一下:https://www.rdocumentation.org/packages/dplyr/versions/0.7.3/topics/summarise –
你可以在獲得data.frames列表之前使用'split',然後通過'lapply'每個列表元素來計算結果。這將返回一個矩陣列表與您的統計數據。 – lmo
@lmo ...爲什麼每個人都忘記'by'? – Parfait