我想解析(在Ruby中)有效的UNIX passwd文件格式:逗號分隔符和轉義字符\
,這樣任何轉義字符都應該從字面上考慮。我正在嘗試爲此使用正則表達式,但即使在使用Oniguruma進行lookahead/lookbehind斷言時,我也會用到它。用轉義字符解析帶分隔符的文本
從本質上講,以下所有應該工作:
a,b,c # => ["a", "b", "c"]
\a,b\,c # => ["a", "b,c"]
a,b,c\
d # => ["a", "b", "c\nd"]
a,b\\\,C# => ["a", "b\,c"]
任何想法?
第一個響應看起來不錯。含
\a,,b\\\,c\,d,e\\f,\\,\
g
文件它給:
[["\\a,"], [","], ["b\\\\\\,c\\,d,"], ["e\\\\f,"], ["\\\\,"], ["\\\ng\n"], [""]]
這是八九不離十。只要所有東西都在逗號分割正確,我不需要在第一遍上完成這些操作。我試圖Oniguruma並結束了(在更長的時間):
Oniguruma::ORegexp.new(%{
(?: # - begins with (but doesn't capture)
(?<=\A) # - start of line
| # - (or)
(?<=,) # - a comma
)
(?: # - contains (but doesn't capture)
.*? # - any set of characters
[^\\\\]? # - not ending in a slash
(\\\\\\\\)* # - followed by an even number of slashes
)*?
(?: # - ends with (but doesn't capture)
(?=\Z) # - end of line
| # - (or)
(?=,)) # - a comma
},
'mx'
).scan(s)
非常好,但它不會像OP想要的那樣將轉義字符更改爲文字字符。這可能不能在正則表達式中完成。 – 2010-02-12 21:06:08
我認爲這種修改是有效的,並且不會捕獲尾隨的逗號本身: s.scan(/((?:\\。| [^,])*),?/ m) – 2010-02-13 15:20:48
s.scan /( (?:\\。| [^,])*)[,\ n $]/mx似乎更健壯一些。 – 2010-02-13 15:34:47