2015-04-23 26 views
1

我需要從輸入數據中提取下面正則表達式提取字符串的第一部分,在Apache的豬

AB55 4 
DD7 6LL 
DD5 2HI 

我的代碼

A = load 'data' as postcode:chararray; 
B = foreach A { 
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1); 
generate code_district; 
}; 
dump B; 

輸出郵編區應該像

AB55 
DD7 
DD5 

什麼應該是正則表達式來提取字符串的第一部分?

回答

1

你可以試試下面的正則表達式嗎?

選項1:

A = LOAD 'input' as postcode:chararray; 
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1); 
DUMP code_district; 

選項2:

A = LOAD 'input' as postcode:chararray; 
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1); 
DUMP code_district; 

輸出:

(AB55) 
(DD7) 
(DD5) 
+0

這並不非ASCII chararrays.i.e工作。 ISO-8859-9 –

相關問題