如果你需要一個複雜的正則表達式,你也可以考慮 寫一個微小的解析器。
$ cat r.awk
BEGIN {
re_wrd = "^[A-Za-z]+" # what we consider a word
re_sep = "^." # the rest is a separator
}
function advance() { # sets `tag' and `tok'; eats a part of `line'
if (match(line, re_wrd)) tag = "wrd"
else if (match(line, re_sep)) tag = "sep"
tok = substr(line, 1, RLENGTH)
line = substr(line, RLENGTH + 1 )
}
function process_sep() { # copy to output
ans = ans tok
}
function process_wrd() {
sub(/^word/, "preword", tok) # replace only at the beginning
ans = ans tok
}
{
line = $0; ans = tag = tok = ""
while (length(line) > 0) {
advance()
# uncomment for tracing
# print tag, "<" tok ">" | "cat 1>&2"
if (tag == "sep") process_sep()
else if (tag == "wrd") process_wrd()
}
print ans
}
用法:
$ echo 'preword...microsoftword word wordword,word.word-preword' | awk -f r.awk
preword...microsoftword preword prewordword,preword.preword-preword
跟蹤:
wrd <preword>
sep <.>
sep <.>
sep <.>
wrd <microsoftword>
sep < >
wrd <word>
sep < >
wrd <wordword>
sep <,>
wrd <word>
sep <.>
wrd <word>
sep <->
wrd <preword>
應該'foreword'成爲'forepreword'?明確您的要求並展示簡潔,可測試的樣本輸入和預期的輸出,涵蓋您的所有用例。 –