2012-10-29 38 views
-1

有在SQL Server沒有中位數的功能,所以我用這個美好的建議:如何獲得每個記錄的中位數?

https://stackoverflow.com/a/2026609/117700

這個計算在整個數據集的中位數,但我需要每個記錄中位數。

我的數據集:

+-----------+-------------+ 
| client_id | TimesTested | 
+-----------+-------------+ 
| 214220 |   1 | 
| 215425 |   1 | 
| 212839 |   4 | 
| 215249 |   1 | 
| 210498 |   3 | 
| 110655 |   1 | 
| 110655 |   1 | 
| 110655 |   12 | 
| 215425 |   4 | 
| 100196 |   1 | 
| 110032 |   1 | 
| 110032 |   1 | 
| 101944 |   3 | 
| 1|   2 | 
| 1|   1 | 
+-----------+-------------+ 

這裏是我使用的查詢:

select client_id, 
    (
    SELECT 
    (
    (SELECT MAX(TimesTested) FROM 
     (SELECT TOP 50 PERCENT t.TimesTested 
     FROM counted3 t 
     where t.timestested>1 
     and CLIENT_ID=t.CLIENT_ID 
     ORDER BY t.TimesTested) AS BottomHalf) 
    + 
    (SELECT MIN(TimesTested) FROM 
     (SELECT TOP 50 PERCENT t.TimesTested 
     FROM counted3 t 
     where t.timestested>1 
     and CLIENT_ID=t.CLIENT_ID 
     ORDER BY t.TimesTested DESC) AS TopHalf) 
    )/2 AS Median 
    ) TotalAvgTestFreq 
from counted3 

group by client_id 

,但它給我的有趣數據:

+-----------+------------------+ 
| client_id | median???????????| 
+-----------+------------------+ 
| 100007 |    84 | 
| 100008 |    84 | 
| 100011 |    84 | 
| 100014 |    84 | 
| 100026 |    84 | 
| 100027 |    84 | 
| 100028 |    84 | 
| 100029 |    84 | 
| 100042 |    84 | 
| 100043 |    84 | 
| 100071 |    84 | 
| 100072 |    84 | 
| 100074 |    84 | 
+-----------+------------------+ 

我可以我得到每個client_id的中位數?

目前我正在試圖使用從阿龍的網站這個真棒查詢:

select c3.client_id,(
    SELECT AVG(1.0 * TimesTested) median 
    FROM 
    (
     SELECT o.TimesTested , 
     rn = ROW_NUMBER() OVER (ORDER BY o.TimesTested), c.c 
     FROM counted3 AS o 
     CROSS JOIN (SELECT c = COUNT(*) FROM counted3) AS c 
     where count>1 
    ) AS x 
    WHERE rn IN ((c + 1)/2, (c + 2)/2) 
    ) a 
    from counted3 c3 
    group by c3.client_id 

不幸的是,Richardthekiwi指出:

它是一個單一的中位數,而這個問題是關於平均值 每分區

我想知道我可以加入它counted3讓每個分區中位數>

+0

你是什麼意思,「每個記錄的中位數」? – Lamak

+0

@Lamak我的意思是client_id 100007的所有值我想要得到的位數 –

+0

如果100007有多次測試1,1,1,1,2,4,5,7,8,99那麼中位數是4 –

回答

1

注意:如果testFreq是intbigint類型,則需要在取平均值之前對其進行CAST處理,否則將得到整數除法。如果2和5是中位數記錄,例如(2+5)/2 => 3AVG(Cast(testfreq as float))

select client_id, avg(testfreq) median_testfreq 
from 
(
    select client_id, 
      testfreq, 
      rn=row_number() over (partition by CLIENT_ID 
           order by testfreq), 
      c=count(testfreq) over (partition by CLIENT_ID) 
    from tbk 
    where timestested>1 
) g 
where rn in (round(c/2,0),c/2+1) 
group by client_id; 

中值被發現無論是作爲以行的奇數箇中央記錄,或這兩個中央記錄在偶數行的平均值。這由條件rn in (round(c/2,0),c/2+1)處理,其中選擇所需的一個或兩個記錄。

+0

非常感謝你!!!!!!! –

1

試試這個:

select client_id, 
    (
    SELECT 
    (
    (SELECT MAX(testfreq) FROM 
     (SELECT TOP 50 PERCENT t.testfreq 
     FROM counted3 t 
     where t.timestested>1 
     and c3.CLIENT_ID=t.CLIENT_ID 
     ORDER BY t.testfreq) AS BottomHalf) 
    + 
    (SELECT MIN(testfreq) FROM 
     (SELECT TOP 50 PERCENT t.testfreq 
     FROM counted3 t 
     where t.timestested>1 
     and c3.CLIENT_ID=t.CLIENT_ID 
     ORDER BY t.testfreq DESC) AS TopHalf) 
    )/2 AS Median 
    ) TotalAvgTestFreq 
from counted3 c3 

group by client_id 

我加入了C3別名外CLIENT_ID引用和外部表。

+0

非常感謝你這可能是什麼問題,但不幸的是,這個查詢已經運行15分鐘到目前爲止 –

相關問題