0
如何計算紅移數據庫中的Truncated Mean?我希望它運行在非常大的數據集上Redshift中的截斷平均值
如何計算紅移數據庫中的Truncated Mean?我希望它運行在非常大的數據集上Redshift中的截斷平均值
Redshift包含常用的SQL統計函數,包括您需要的NTILE
。
SELECT AVG(CASE WHEN quartile IN (2,3) THEN my_metric ELSE NULL END) central_mean
,AVG(my_metric) mean
FROM (SELECT my_metric, NTILE(4) OVER (ORDER BY cpu_usage) quartile
FROM (SELECT * FROM my_table LIMIT 1000) t) t
;
您可以獲得想要修剪的百分位數的閾值。然後過濾超出這些閾值邊界的度量值,最後可以計算平均值。
SELECT avg(your_metric)
FROM (
SELECT
your_metric,
PERCENTILE_DISC(0.1) -- 10% lower boundary
WITHIN GROUP (ORDER BY your_metric) OVER() AS lower_threshold,
PERCENTILE_DISC(0.9) -- 90% higher boundary
WITHIN GROUP (ORDER BY your_metric) OVER() AS higher_threshold
FROM your_table
) t1 WHERE your_metric > lower_threshold AND your_metric < higher_threshold