2015-04-28 49 views
1

我想分割區域轉換的字符串。我有這樣的數據。在豬中使用regex_extract方法打印空白區域

(149Sq.Yards) 
(151Sq.Yards) 
(190Sq.Yards) 
(190Sq.Yards) 

我想分解上述數據。

149 sq.yards 
151 sq.yards 

我試了下面的代碼。

a = LOAD '/user/ahmedabad/Makkan_PropertyDetails_Apartment_Ahmedabad.csv' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray,PriceperArea:chararray,NoOfBedRooms:int,NoOfBathRooms:int,FloorNoOfProperty:chararray,TotalFloors:int,Possession:chararray,BuiltUpArea:chararray,Furnished:chararray,Ownership:chararray,NewResale:chararray,Facing:chararray,title:chararray,PropertyAddress:chararray,NearByFacilities:chararray,PropertyFeatures:chararray,Sellerinfo:chararray,Description:chararray); 
b = FOREACH a GENERATE BuiltUpArea; 
c = FILTER b BY (BuiltUpArea matches '.*Sq.Yards.*'); 
d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(.*)', 1) * 9; 

while dump d .it print null。

回答

0

你提到的正則表達式會匹配所有的字符,所以它會嘗試像這樣乘以(149Sq.Yards * 9)。這是輸出中爲null的原因。

下面的正則表達式將單獨從輸入中拆分數字,並像這樣(149 * 9)相乘。

d = FOREACH c GENERATE (bigdecimal) REGEX_EXTRACT(BuiltUpArea,'(^[0-9]+)', 1) * 9; 
dump d; 
+0

謝謝。它的工作 –