2017-08-28 72 views
0

我有一個包含以下數據的表格。嘗試提取第二個字段,如果我們用「_」分隔並且它應該包含[numbers-numbers | numbers-numbers]。嘗試與regexp_extract,但它無法得到所需的結果。BigQuery正則表達式

請建議如何做到這一點。

數據:

           output 
D22_022-010|022-009_84233|669250 345  022-010 172.5 
D22_022-010|022-009_666249|843250 22  022-009 172.5 
D28I_28-04_5042|44182_250   235  022-010 11 
D22_022-010|022-009_8423250   232  022-009 11 
D23_23-06_NA_FW27_D23_600   22  28-04 235 
D21_21-08_NA_FW14_D21_50   56  022-010 116 
D23_23-06_NA_FW27_D23_90   88  022-009 116 
D21_21-08_NA_FW14_D21_50   99  23-06 22 
G | TR | Search : 56021    89  21-08 56 
Free Sprayer_1x1(3.30)    77  23-06 88 
Click Tracker (5.4)     33  23-06 99 
6.1 FW18_D28o_Click     4  21-08 89 
              null 77 
              null 33 
              null 4 

Table Data

+0

那你有REGEXP_EXTRACT呢? –

+0

嘗試此查詢來提取第二個字段,但得到以下錯誤「數組索引1超出界限(溢出)」。 SELECT REGEXP_extract(split(AD,「_」)[offset(1)],'[0-9] + - [0-9] +')作爲廣告FROM(選擇「G | TR |搜索:56021」 ) – KeepLearn

+0

該字符串沒有下劃線......如果至少有兩個元素,它將只在偏移量1處有東西。 –

回答

1

下面是BigQuery的標準SQL

假設你列ad及以下value應該做你問什麼

#standardSQL 
SELECT item, ROUND(IFNULL(value/ARRAY_LENGTH(items), value)) AS split_value 
FROM (
    SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items 
    FROM `yourProject.yourDataset.yourTable` 
) LEFT JOIN UNNEST(items) AS item 

您可以請從你的問題測試此下方的虛擬數據

#standardSQL 
WITH `yourTable` AS (
    SELECT 'D22_022-010|022-009_84233|669250' AS ad, 345 AS value UNION ALL 
    SELECT 'D22_022-010|022-009_666249|843250', 22 UNION ALL 
    SELECT 'D28I_28-04_5042|44182_250', 235 UNION ALL 
    SELECT 'D22_022-010|022-009_8423250', 232 UNION ALL 
    SELECT 'D23_23-06_NA_FW27_D23_600', 22 UNION ALL 
    SELECT 'D21_21-08_NA_FW14_D21_50', 56 UNION ALL 
    SELECT 'D23_23-06_NA_FW27_D23_90', 88 UNION ALL 
    SELECT 'D21_21-08_NA_FW14_D21_50', 99 UNION ALL 
    SELECT 'G | TR | Search : 56021', 89 UNION ALL 
    SELECT 'Free Sprayer_1x1(3.30)', 77 UNION ALL 
    SELECT 'Click Tracker (5.4)', 33 UNION ALL 
    SELECT '6.1 FW18_D28o_Click', 4 
) 
SELECT item, ROUND(IFNULL(value/ARRAY_LENGTH(items), value)) AS split_value 
FROM (
    SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items 
    FROM `yourTable` 
) LEFT JOIN UNNEST(items) AS item 

結果是(正如你所期望的)

item split_value 
------- ----------- 
022-010  173.0 
022-009  173.0 
022-010  11.0 
022-009  11.0 
28-04   235.0 
022-010  116.0 
022-009  116.0 
23-06   22.0 
21-08   56.0 
23-06   88.0 
21-08   99.0 
null   89.0 
null   77.0 
null   33.0 
null   4.0