2017-08-09 39 views
2

在我的CENSUS表中,我想按州分組,並且爲每個州獲得縣中位數和縣數。百分點函數與BigQuery中的GROUPBY

在psql裏,紅移和雪花,我可以這樣做:

psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state; 
     state   | count | median 
----------------------+-------+---------- 
Alabama    | 67 | 36583 
Alaska    | 24 | 7296.5 
Arizona    | 15 | 116320 
Arkansas    | 75 | 20229 
... 

我試圖找到一個很好的方式在標準的BigQuery做到這一點。我注意到有沒有無證的percentile_cont分析功能可用,但我必須做一些主要的黑客來讓它做我想做的事情。

我希望能夠做同樣的事情與我所收集的是正確的參數:

SELECT 
    state, 
    COUNT(county), 
    PERCENTILE_CONT(population2000, 
    0.5) OVER() AS `medPop` 
FROM 
    CENSUS 
GROUP BY 
    state; 

但這種查詢產生的錯誤

SELECT list expression references column population2000 which is neither grouped nor aggregated at 

可以得到我想要的答案,但是如果這是推薦的方式來做我想做的事,我會非常失望:

SELECT 
    MAX(nCounties) AS nCounties, 
    state, 
    MAX(medPop) AS medPop 
FROM (
    SELECT 
    nCounties, 
    T1.state, 
    (PERCENTILE_CONT(population2000, 
     0.5) OVER (PARTITION BY T1.state)) AS `medPop` 
    FROM 
    census T1 
    LEFT OUTER JOIN (
    SELECT 
     COUNT(county) AS `nCounties`, 
     state 
    FROM 
     census 
    GROUP BY 
     state) T2 
    ON 
    T1.state = T2.state) T3 
GROUP BY 
    state 

有沒有更好的方法去做我想做的事情?此外,PERCENTILE_CONT函數是否有記錄?

感謝您的閱讀!

回答

5

感謝您的關注。 PERCENTILE_CONT正在開發中,我們將在發佈GA之後發佈文檔。我們將首先作爲分析函數來支持它,並且我們計劃稍後將它作爲聚合函數(允許GROUP BY)來支持它。這兩個版本之間,一個簡單的解決方法是

SELECT 
    state, 
    ANY_VALUE(nCounties) AS nCounties, 
    ANY_VALUE(medPop) AS medPop 
FROM (
    SELECT 
    state, 
    COUNT(county) OVER (PARTITION BY state) AS nCounties, 
    PERCENTILE_CONT(population2000, 
     0.5) OVER (PARTITION BY state) AS medPop 
    FROM 
    CENSUS) 
GROUP BY 
    state 
+1

更新:我們已經在https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#公佈的文件PERCENTILE_CONT。 –