2013-10-09 16 views
5

我正在使用SQL查詢來確定幾列的z-分數(x - μ/σ)。計算幾個列的各自z-分數

特別是,我有一個表如下所示:

my_table 
id col_a col_b col_c 
1  3  6  5 
2  5  3  3 
3  2  2  9 
4  9  8  2 

...我想選擇的每一行的每個數字的Z值,根據平均值和標準偏差的柱。

所以結果是這樣的:

id col_d  col_e  col_f 
1 -0.4343 1.0203 ... 
2  0.1434 -0.8729 
3 -0.8234 -1.2323 
4  1.889  1.5343 

目前我的代碼計算得分兩列,看起來像這樣:

select id, 
    (my_table.col_a - avg(mya.col_a))/stddev(mya.col_a) as col_d, 
    (my_table.col_b - avg(myb.col_b))/stddev(myb.col_b) as col_e, 
from my_table, 
select col_a from my_table)mya, 
select col_b from my_table)myb 
group by id; 

然而,這是極其緩慢。我一直在等待三分鐘查詢的分鐘數。

有沒有更好的方法來實現這個目標?我使用postgres,但任何通用語言都會幫助我。謝謝!

+0

一些問題:1)你爲什麼要通過ID摸索?如果它是一個主鍵,那麼你將不會分組任何東西2)那裏有什麼'select col_a'在那裏做? 3)這實際上是一個評論。如果你沒有分組任何東西,那麼'avg(value)'將等於'value' –

+0

1)我不需要按ID進行分組,但Postgres說「列'my_table.id'必須出現在GROUP BY子句中「,所以目前這樣做是爲了避免錯誤2)這些選擇不需要在查詢中,這是真的。 – dmc7z

回答

13

您可以使用窗口函數像這樣:

select 
    t.id, 
    (t.col_a - avg(t.col_a) over())/stdev(t.col_a) over() as col_d, 
    (t.col_b - avg(t.col_b) over())/stdev(t.col_b) over() as col_e 
from my_table as t 

或交叉連接與預先計算avgstdev

select 
    t.id, 
    (t.col_a - tt.col_a_avg)/tt.col_a_stdev as col_d, 
    (t.col_b - tt.col_b_avg)/tt.col_b_stdev as col_e 
from my_table as t 
    cross join (
     select 
      avg(tt.col_a) as col_a_avg, 
      avg(tt.col_b) as col_b_avg, 
      stdev(tt.col_a) as col_a_stdev, 
      stdev(tt.col_b) as col_b_stdev 
     from my_table as tt 
    ) as tt 
+2

窗口函數。正是我在找什麼。謝謝! – dmc7z

+0

偉大的解決方案。如果你在表中有空值,那麼怎麼樣?它是零/零問題 –

+0

@OğuzCanSertel在一個select語句中使用一個簡單的'CASE'語句就足夠了。 – pimbrouwers

-2

我會通過選擇AVG()和STDDEV()屬性到表變量開始,然後使用該表計算

所以你會得到一個表變量有以下的列 AVG_col_a,stddev_col_a, AVG_col b,stddev_col_b ......

像這樣

DECLARE @Table as table (AVG_col_a, stddev_col_a, AVG_col b, stddev_col_b ......) 
INSERT into @Table 
SELECT AVG(col_A), stddev(col_a), ....... 
FROM myTable 

SELECT (m.col_a-AVG_col_a)/stddev_col_a as col_d, 
     (m.col_b-AVG_col_b)/stddev_col_b as col_e 
FROM myTable m, @Table 
+0

這在PostgreSQL中不起作用。 –

+0

然後他可以使用臨時表,他說任何通用語言都會幫助@mu短暫地提供 – Hedinn

0

使用WITH子句:

WITH stats AS (SELECT avg (col_a) a_avg, stddev (col_a) a_stddev, 
         avg (col_b) b_avg, stddev (col_b) b_stddev 
        FROM my_table 
      ) 
SELECT id, (col_a - a_avg)/a_stddev col_d, 
      (col_b - b_avg)/b_stddev col_e 
    FROM my_table, stats 

但我喜歡羅馬的窗口更好的解決方案。

對於奧古茲:對付NULL值MY_TABLE:

WITH stats AS ( 
       SELECT avg (col_a) a_avg, stddev (col_a) as a_stddev, 
        avg (col_b) b_avg, stddev (col_b) as b_stddev 
        FROM my_table 
      ) 
SELECT id, 
     COALESCE ((col_a - a_avg)/a_stddev, NULL) col_d, 
     COALESCE ((col_b - b_avg)/b_stddev, NULL) col_e 
FROM my_table, stats