平均值很難定義分區

我有這個表：平均值很難定義分區

create table t (value int, dt date); 

value |  dt  
-------+------------ 
    10 | 2012-10-30 
    15 | 2012-10-29 
    null | 2012-10-28 
    null | 2012-10-27 
    7 | 2012-10-26

而且我想這樣的輸出：

value |  dt  
-------+------------ 
    10 | 2012-10-30 
    5 | 2012-10-29 
    5 | 2012-10-28 
    5 | 2012-10-27 
    7 | 2012-10-26

我想要的空值，還有一個先前的非空值，當表格按日期降序排列時，用前面非空值的平均值替換。在這個例子中，值15是接下來的兩個空值的前一個非空值。因此，15/3 = 5

SQL Fiddle

來源

2012-11-05 Clodoaldo Neto

+1非常好的問題。它擁有它需要的一切 - 好吧，我從小提琴中推斷出PostgreSQL 9.2。 –

我發現了一個令人驚訝的簡單的解決方案：

SELECT max(value) OVER (PARTITION BY grp) 
    /count(*) OVER (PARTITION BY grp) AS value 
     ,dt 
FROM (
    SELECT *, count(value) OVER (ORDER BY dt DESC) AS grp 
    FROM t 
    ) a;

-> sqlfiddle

由於count()忽略NULL值，你可以使用（在窗口函數默認）運行計數快速分組值（ - >grp）。

每組都有一個非空值，所以我們可以使用min/max/sum在另一個窗口函數中得到相同的結果。在grp之間除以成員數（count(*)這次計數NULL的值！），結束了。

來源

2012-11-05 19:12:29

不錯，但似乎PostgreSQL特定。 – jsalvata

@jsalvata：「但」？你有沒有注意到[PostgreSQL]標籤？另外，這是標準的SQL。 [ - > ** sqlfiddle for SQL server ** with identical query]（http://www.sqlfiddle.com/#!6/fb11e/1）。 –

不，我沒有。蹩腳的mySQL不支持它。是的，這是標準的。 – jsalvata

作爲一個謎，這是一個解決方案...在實踐中，可根據您的數據的性質可怕執行。注意你的索引，在任何情況下：

create database tmp; 
create table t (value float, dt date); -- if you use int, you need to care about rounding 
insert into t values (10, '2012-10-30'), (15, '2012-10-29'), (null, '2012-10-28'), (null, '2012-10-27'), (7, '2012-10-26'); 

select t1.dt, t1.value, t2.dt, t2.value, count(*) cnt 
from t t1, t t2, t t3 
where 
    t2.dt >= t1.dt and t2.value is not null 
    and not exists (
     select * 
     from t 
     where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null 
    ) 
    and t3.dt <= t2.dt 
    and not exists (
     select * 
     from t where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null 
    ) 
group by t1.dt; 

+------------+-------+------------+-------+-----+ 
| dt   | value | dt   | value | cnt | 
+------------+-------+------------+-------+-----+ 
| 2012-10-26 |  7 | 2012-10-26 |  7 | 1 | 
| 2012-10-27 | NULL | 2012-10-29 | 15 | 3 | 
| 2012-10-28 | NULL | 2012-10-29 | 15 | 3 | 
| 2012-10-29 | 15 | 2012-10-29 | 15 | 3 | 
| 2012-10-30 | 10 | 2012-10-30 | 10 | 1 | 
+------------+-------+------------+-------+-----+ 
5 rows in set (0.00 sec) 

select dt, value/cnt 
from (
    select t1.dt , t2.value, count(*) cnt 
    from t t1, t t2, t t3 
    where 
     t2.dt >= t1.dt and t2.value is not null 
     and not exists (
      select * 
      from t 
      where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null 
     ) 
    and t3.dt <= t2.dt 
    and not exists (
     select * 
     from t 
     where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null 
    ) 
    group by t1.dt 
) x; 

+------------+-----------+ 
| dt   | value/cnt | 
+------------+-----------+ 
| 2012-10-26 |   7 | 
| 2012-10-27 |   5 | 
| 2012-10-28 |   5 | 
| 2012-10-29 |   5 | 
| 2012-10-30 |  10 | 
+------------+-----------+ 
5 rows in set (0.00 sec)

說明：

T1是原始表
T2是與非空值
T3成爲之間的所有行，因此我們可以通過其他組和計數

對不起，我不能再清楚不過了。這是混淆對我來說太:-)

來源

2012-11-05 18:34:14 jsalvata

如果解釋太複雜，很可能是，它太複雜了。 :) –

確實。 Clodoaldo的編輯看起來幾乎可以理解。 – jsalvata

平均值很難定義分區

回答

相關問題