另一種選擇,可顯示電源和BigQuery的標準的涼意SQL
#standardSQL
WITH pairs AS (
SELECT
(SELECT STRING_AGG(country ORDER BY country)
FROM UNNEST(ARRAY[country1, country2]) AS country
) AS countries,
SUM(COUNT) AS COUNT
FROM yourTable
GROUP BY countries
)
SELECT
REGEXP_EXTRACT(countries, r'(\w+),') AS country1,
REGEXP_EXTRACT(countries, r',(\w+)') AS country2,
COUNT
FROM pairs
此版本可以更爲優化,當你有不只是兩個領域更是「錯誤命令」
可以簡要地測試它下面的虛擬數據
#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 65 AS COUNT UNION ALL
SELECT 'TWN', 'KOR', 32 UNION ALL
SELECT 'KOR', 'CHN', 43
)
而下面是當多於兩個字段洗牌
#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' as country3, 65 AS COUNT UNION ALL
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL
SELECT 'KOR', 'US', 'CHN', 43
),
pairs AS (
SELECT
(SELECT STRING_AGG(country ORDER BY country)
FROM UNNEST(ARRAY[country1, country2, country3]) AS country
) AS countries,
SUM(COUNT) AS COUNT
FROM yourTable
GROUP BY countries
)
SELECT
REGEXP_EXTRACT(countries, r'(\w+),\w+,\w+') AS country1,
REGEXP_EXTRACT(countries, r'\w+,(\w+),\w+') AS country2,
REGEXP_EXTRACT(countries, r'\w+,\w+,(\w+)') AS country3,
COUNT
FROM pairs
當然,可以進一步優化箱子快速的例子,但這裏主要着眼於洗牌的邏輯不需要多重比較/等
加成
謝謝@GordonLinoff下面選擇堅持!我認爲你是正確的 - 這是更優雅的使用ARRAY_AGG這裏
#standardSQL
WITH yourTable AS (
SELECT 'CHN' AS country1, 'KOR' AS country2, 'US' AS country3, 65 AS count UNION ALL
SELECT 'TWN', 'KOR', 'GB', 32 UNION ALL
SELECT 'KOR', 'US', 'CHN', 43
),
pairs AS (
SELECT
(SELECT ARRAY_AGG(country ORDER BY country)
FROM UNNEST(ARRAY[country1, country2, country3]) AS country
) AS countries,
count
FROM yourTable
)
SELECT
countries[OFFSET(0)] AS country1,
countries[OFFSET(1)] AS country2,
countries[OFFSET(2)] AS country3,
SUM(count) AS count
FROM pairs
GROUP BY 1, 2, 3
只是一個小小的更正:BigQuery標準SQL不允許GROUP BY中的表達式,所以您的解決方案應該被更正爲「group by 1,2」。 –
@MoshaPasumansky。 。 。謝謝。 –