這是您的模式的正則表達式腳本,主要是我驗證所有的字段。請針對您的輸入運行,並讓我知道您是否需要其他驗證。
正則表達式:
'^
([A-Za-z]+\\s+[A-Za-z]+)\\s*\\|\\s*
([A-Za-z]+)\\s*\\|\\s*
(\\{(?:\\([A-Za-z_]+,[0-9]+\\))(?:,\\([A-Za-z_]+,[0-9]+\\))*\\})\\s*\\|\\s*
(\\[(?:[A-Za-z_]+#[0-9\\.]+)(?:,[A-Za-z_]+#[0-9\\.]+)*\\])
$'
input.txt中
我已標記的每個下面的輸入是有效還是無效
Jorge Posada |Yankees| {(Catcher,2000),(Designated_hitter,2001)}|[games#1594,hit_by_pitch#65,grand_slams#7] -->Valid
Landon Powell |Oakland|{(Catcher,2000),(First_baseman,2001)}|[on_base_percentage#0.297,games#26,home_runs#7] ->Valid
Martin Prado |Atlanta| {(Second_baseman,2002),(Infielder,2003),(Left_fielder)}|[games#258,hit_by_pitch#3] -->Invalid year missing
Martin Prado |Atlanta| {(Second_baseman,2002)(Infielder,2003)}|[games#258,hit_by_pitch#3] ->Invalid no comma between two tuples
Martin Prado |Atlanta| {,(Second_baseman,2002),(Infielder,2003)}|[games#258,hit_by_pitch#3] --> Invalid comma in the start of tuple
Martin Prado |Atlanta| {(Second_baseman,2002),(,2003)}|[games#258,hit_by_pitch#3] -->Invalid position is missing
Martin Prado |Atlanta| {(Second_baseman,2002),(Infielder,2003)}[games#258,hit_by_pitch#3] --> Invalid Demiiter | is missing
Martin Prado || {(Second_baseman,2002),(Infielder,2003)}[games#258,hit_by_pitch#3] --> Invalid Team name is missing
Martin Prado |Atlanta| {(Second_baseman,2002),(Infielder,2003)}[games#,hit_by_pitch#3] --> Invalid Key value is missing for games
Landon Powell |Oakland|{(Catcher,2000)}|[on_base_percentage#0.297] --> Valid
Landon Powell |Oakland|{(Catcher,2000),(First_baseman,2001),(test,3000)}|[on_base_percentage#0.297,games#26,home_runs#7,test#1.2] -->valid
PigScript:
A = LOAD 'input.txt' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^([A-Za-z]+\\s+[A-Za-z]+)\\s*\\|\\s*([A-Za-z]+)\\s*\\|\\s*(\\{(?:\\([A-Za-z_]+,[0-9]+\\))(?:,\\([A-Za-z_]+,[0-9]+\\))*\\})\\s*\\|\\s*(\\[(?:[A-Za-z_]+#[0-9\\.]+)(?:,[A-Za-z_]+#[0-9\\.]+)*\\])$')) AS (name:chararray,team:chararray,pos:bag{t:(p:chararray)},bat:map[]);;
DUMP B;
輸出:如果輸入與模式不匹配,則會將輸出打印爲空。
(Jorge Posada,Yankees,{(Catcher,2000),(Designated_hitter,2001)},[games#1594,hit_by_pitch#65,grand_slams#7]) -->Valid
(Landon Powell,Oakland,{(Catcher,2000),(First_baseman,2001)},[on_base_percentage#0.297,games#26,home_runs#7]) -->Valid
() -->Invalid,Year missing
() -->Invalid,No comma between two tuples
() -->Invalid,Comma in the start of tuple
() -->Invalid,Position is missing
() -->Invalid,Demiiter | is missing
() -->Invalid Team name is missing
() -->Invalid Key value is missing for games
(Landon Powell,Oakland,{(Catcher,2000)},[on_base_percentage#0.297]) -->Valid
(Landon Powell,Oakland,{(Catcher,2000),(First_baseman,2001),(test,3000)},[on_base_percentage#0.297,games#26,home_runs#7,test#1.2]) -->valid
您可以添加更多樣本來驗證輸入嗎?有效和無效。 – 2014-11-15 18:59:44