有在SQL Server沒有中位數的功能,所以我用這個美好的建議:如何獲得每個記錄的中位數?
https://stackoverflow.com/a/2026609/117700
這個計算在整個數據集的中位數,但我需要每個記錄中位數。
我的數據集:
+-----------+-------------+
| client_id | TimesTested |
+-----------+-------------+
| 214220 | 1 |
| 215425 | 1 |
| 212839 | 4 |
| 215249 | 1 |
| 210498 | 3 |
| 110655 | 1 |
| 110655 | 1 |
| 110655 | 12 |
| 215425 | 4 |
| 100196 | 1 |
| 110032 | 1 |
| 110032 | 1 |
| 101944 | 3 |
| 1| 2 |
| 1| 1 |
+-----------+-------------+
這裏是我使用的查詢:
select client_id,
(
SELECT
(
(SELECT MAX(TimesTested) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested) AS BottomHalf)
+
(SELECT MIN(TimesTested) FROM
(SELECT TOP 50 PERCENT t.TimesTested
FROM counted3 t
where t.timestested>1
and CLIENT_ID=t.CLIENT_ID
ORDER BY t.TimesTested DESC) AS TopHalf)
)/2 AS Median
) TotalAvgTestFreq
from counted3
group by client_id
,但它給我的有趣數據:
+-----------+------------------+
| client_id | median???????????|
+-----------+------------------+
| 100007 | 84 |
| 100008 | 84 |
| 100011 | 84 |
| 100014 | 84 |
| 100026 | 84 |
| 100027 | 84 |
| 100028 | 84 |
| 100029 | 84 |
| 100042 | 84 |
| 100043 | 84 |
| 100071 | 84 |
| 100072 | 84 |
| 100074 | 84 |
+-----------+------------------+
我可以我得到每個client_id的中位數?
目前我正在試圖使用從阿龍的網站這個真棒查詢:
select c3.client_id,(
SELECT AVG(1.0 * TimesTested) median
FROM
(
SELECT o.TimesTested ,
rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested), c.c
FROM counted3 AS o
CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c
where count>1
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
) a
from counted3 c3
group by c3.client_id
不幸的是,Richardthekiwi指出:
它是一個單一的中位數,而這個問題是關於平均值 每分區
我想知道我可以加入它counted3
讓每個分區中位數>
你是什麼意思,「每個記錄的中位數」? – Lamak
@Lamak我的意思是client_id 100007的所有值我想要得到的位數 –
如果100007有多次測試1,1,1,1,2,4,5,7,8,99那麼中位數是4 –