在Big Query中總結/合併/組合反轉對

我有一個類似這樣的表格，其中經常以相反的順序與配對關係計數。在Big Query中總結/合併/組合反轉對

country1 country2 count 
CHN   KOR   65 
TWN   KOR   32 
KOR   CHN   43

這裏我有CHN - KOR和KOR - CHN。如果我已經確定這些是不同的罪名，那麼這些只是代表描述的關係的兩種方式，我想總結的對的計數，因此最終的結果是

country1 country2 count 
CHN   KOR   108 
TWN   KOR   32

我用大查詢。有誰知道在SQL中整合反轉對的方法嗎？注意：這些都不是重複的，所以這不是刪除重複的問題，但結合逆轉對

來源

2017-03-01 Nate Miller

這裏有一個方法：

select country1, country2, sum(count) 
from ((select country1, country2, count 
     from t 
     where country1 <= country2 
    ) union all 
     (select country2, country1, count 
     from t 
     where country1 > country2 
    ) 
    ) cc 
group by country1, country2;

這將爲傳統的和標準的接口都工作。對於標準，BigQuery支援對字符串greatest()和least()：

select least(country1, country2), greatest(country1, country2), sum(count) 
from ((select country1, country2, count 
     from t 
     where country1 <= country2 
    ) union all 
     (select country2, country1, count 
     from t 
     where country1 > country2 
    ) 
    ) cc 
group by 1, 2;

來源

2017-03-01 03:50:54

只是一個小小的更正：BigQuery標準SQL不允許GROUP BY中的表達式，所以您的解決方案應該被更正爲「group by 1,2」。 –

@MoshaPasumansky。。。謝謝。 –

另一種選擇，可顯示電源和BigQuery的標準的涼意SQL

#standardSQL 
WITH pairs AS (
    SELECT 
    (SELECT STRING_AGG(country ORDER BY country) 
     FROM UNNEST(ARRAY[country1, country2]) AS country 
    ) AS countries, 
    SUM(COUNT) AS COUNT 
    FROM yourTable 
    GROUP BY countries 
) 
SELECT 
    REGEXP_EXTRACT(countries, r'(\w+),') AS country1, 
    REGEXP_EXTRACT(countries, r',(\w+)') AS country2, 
    COUNT 
FROM pairs

此版本可以更爲優化，當你有不只是兩個領域更是「錯誤命令」

可以簡要地測試它下面的虛擬數據

#standardSQL 
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 65 AS COUNT UNION ALL 
SELECT 'TWN', 'KOR', 32 UNION ALL 
SELECT 'KOR', 'CHN', 43 
)

而下面是當多於兩個字段洗牌

#standardSQL 
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' as country3, 65 AS COUNT UNION ALL 
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL 
SELECT 'KOR', 'US', 'CHN', 43 
), 
pairs AS (
    SELECT 
    (SELECT STRING_AGG(country ORDER BY country) 
     FROM UNNEST(ARRAY[country1, country2, country3]) AS country 
    ) AS countries, 
    SUM(COUNT) AS COUNT 
    FROM yourTable 
    GROUP BY countries 
) 
SELECT 
    REGEXP_EXTRACT(countries, r'(\w+),\w+,\w+') AS country1, 
    REGEXP_EXTRACT(countries, r'\w+,(\w+),\w+') AS country2, 
    REGEXP_EXTRACT(countries, r'\w+,\w+,(\w+)') AS country3, 
    COUNT 
FROM pairs

當然，可以進一步優化箱子快速的例子，但這裏主要着眼於洗牌的邏輯不需要多重比較/等

加成

謝謝@GordonLinoff下面選擇堅持！我認爲你是正確的 - 這是更優雅的使用ARRAY_AGG這裏

#standardSQL 
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' AS country3, 65 AS count UNION ALL 
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL 
SELECT 'KOR', 'US', 'CHN', 43 
), 
pairs AS (
    SELECT 
    (SELECT ARRAY_AGG(country ORDER BY country) 
     FROM UNNEST(ARRAY[country1, country2, country3]) AS country 
    ) AS countries, 
    count 
    FROM yourTable 
) 
SELECT 
    countries[OFFSET(0)] AS country1, 
    countries[OFFSET(1)] AS country2, 
    countries[OFFSET(2)] AS country3, 
    SUM(count) AS count 
FROM pairs 
GROUP BY 1, 2, 3

來源

2017-03-01 01:16:49

爲什麼你'string_agg（''的ARRAY_AGG代替）（）'和剛拉出來的元素？ –

@GordonLinoff - 你也可以試試。只是想給方向不同於其他答案中給出的經典/明顯的方向。如果你喜歡它，你可以使用這個函數：o） –

如果你使用了數組，我會贊成。使用一個字符串來表示列表而不是數組不是一般的而只是不雅觀的。 –

在Big Query中總結/合併/組合反轉對

回答

相關問題