1
我想在我的Pig中使用方法REPLACE
,SUBSTRING
和INDEXOF
,但我無法以很好的方式使用它。豬:多次調用外部方法
第一種情況:在
REPLACE
REGEX_EXTRACT_ALL
:data_split = FOREACH data GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, MY_REGULAR_EXPRESSION)) AS ( timestamp: chararray, url: chararray, REPLACE(url , '.*?://', '') AS clean_url: chararray);
我想用REPLACE刪除前導http://
的URL。在這種情況下,我得到:
Error during parsing. Encountered " "(" "(""
第二種情況:重用輸出:
ws = FOREACH data_split { clean_url = REPLACE(url , '.*?://', ''); url_index = INDEXOF(clean_url, '/'); web_server = SUBSTRING(clean_url, 0, url_index); GENERATE web_server, timestamp, ip ;
無論這種情況下工作,當我嘗試從先前調用重用clean_url
到REPLACE
,我得到
Attempt to give operator of type
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
multiple outputs. This operator does not support multiple outputs.
謝謝