2017-06-21 59 views
0

我使用PERCENT_RANK()函數來獲取給定數據集的百分比度量。這裏的查詢:在Redshift中,是否有過濾最接近給定值的記錄的方法?

WITH time_values AS (
    SELECT 
     var, 
     (end_time - start_time) * 1.0/3600000000 AS num_hours, 
     PERCENT_RANK() OVER (PARTITION BY var1 ORDER BY num_hours) AS pct_rank 
    FROM table 
    WHERE 
     start_time >= 1493596800000000 
     AND end_time < 1493683200000000 
) 
SELECT 
    var, 
    pct_rank, 
    num_hours 
FROM time_values 
WHERE pct_rank IN (0.25, 0.5, 0.8, 0.99) 
ORDER BY 1, 2; 

然而,鑑於方式PERCENT_RANK()的作品,我不打算讓每一個我所關心的百分位的精確匹配,所以輸出看起來像:

var | pct_rank | num_hours 
-----+----------+------------------ 
    a |  0.25 | 31.752826672222 
    a |  0.5 | 171.844016125555 
    b |  0.25 | 230.704589953055 
    b |  0.5 | 246.269648327222 

我正在尋找一種方法來返回每個我所關心的百分位值的值,或者如果找不到精確匹配,則會返回最接近該百分位的值。這是可行的嗎?

回答

0

您可以記錄排序,然後用百分截止前的等級選擇最大值:

WITH time_values AS (
    SELECT 
     var, 
     (end_time - start_time) * 1.0/3600000000 AS num_hours, 
     row_number() OVER (PARTITION BY var1 ORDER BY num_hours) AS rank, 
     count(1) OVER (PARTITION BY var1) AS records 
    FROM table 
    WHERE 
     start_time >= 1493596800000000 
     AND end_time < 1493683200000000 
) 
SELECT 
    var, 
    max(case when 1.0*rank/count<0.25 then num_hours end) as percentile_25, 
    max(case when 1.0*rank/count<0.50 then num_hours end) as percentile_50, 
    max(case when 1.0*rank/count<0.80 then num_hours end) as percentile_80, 
    max(case when 1.0*rank/count<0.99 then num_hours end) as percentile_99 
FROM time_values 
ORDER BY 1; 

或做同樣的PERCENT_RANK()輸出,如果你真的想有輸出逐行不列〜明智然後聯合最後一步結果得到所需的結構

相關問題