2017-03-07 52 views
0

我有somtimes有兩個下劃線,有時一個爲國家縮寫這樣的字符串值:正則表達式替換_用 - 在蜂巢

Cusco_DE_campaign_Million 
Manzan_ES_CA_order_stra 
Tijuan_FR_sitc_Mill 

我想用連字符來代替下劃線只有當國家縮寫countains兩次一組首都(這樣CA_FR或ES_CA等)的

所以輸出應該是這樣的:

Cusco_DE_campaign_Million 
Manzan_ES-CA_order_stra 
Tijuan_FR_sitc_Mill 

我該如何使用regex_replace在Hive SQL中編寫此代碼?

謝謝!

回答

1
Replace _  preceded by 2 uppercase letters and _/start of string 
      and followed by 2 uppercase letters and _/end of string 

with t as 
(
    select explode 
      (
       array 
       (
        'Cusco_DE_campaign_Million' 
        ,'Manzan_ES_CA_order_stra' 
        ,'Tijuan_FR_sitc_Mill' 
       ) 
      ) as (val) 
) 
select regexp_replace (val,'(?<=(^|_)[A-Z]{2})_(?=[A-Z]{2}(_|$))','-') 
from t 
; 

+---------------------------+ 
| Cusco_DE_campaign_Million | 
+---------------------------+ 
| Manzan_ES-CA_order_stra | 
+---------------------------+ 
| Tijuan_FR_sitc_Mill  | 
+---------------------------+ 
+0

你還在這裏? –