其中一個解決這個問題的方法是使用向前斷言:
(?=("[^"]*"))|(?=('[^']*'))|(?=<([^<>]+)>)
讓我們分解的正則表達式來獲得更好的視野:
(?= # zero-width assertion, look ahead if there is ...
("[^"]*") # a double quoted string, group it in group number 1
) # end of lookahead
| # or
(?= # zero-width assertion, look ahead if there is ...
('[^']*') # a single quoted string, group it in group number 2
) # end of lookahead
| # or
(?= # zero-width assertion, look ahead if there is ...
<([^<>]+)> # match anything except <> between <> one or more times and group it in group number 3
) # end of lookahead
你可能會認爲what in the world is he doing?
,無問題我會進一步解釋你的正則表達式失敗的原因。
我們有以下字符串<telerik:RadTab Text="RGB">
:
<telerik:RadTab Text="RGB">
^ the regex engine starts here
since there is no match with ("[^"]*")|('[^']*')|([^<>]+)
it will look further !
<telerik:RadTab Text="RGB">
^ the regex engine will now take a look here
it will check if there is "[^"]*", well obviously there isn't
now since there is an alternation, the regex engine will
check if there is '[^']*', meh same thing
it will now check if there is [^<>]+, but hey it matches !
So your regex engine will "eat" it like so
<telerik:RadTab Text="RGB">
^^^^^^^^^^^^^^^^^^^^^^^^^ and match this, by eating I mean it's advancing
Now the regex engine is at this point
<telerik:RadTab Text="RGB">
^and obviously, there is no match
The problem is, you want it to "step" back to match "RGB"
The regex engine won't go back for you :(
這就是爲什麼我們使用零寬度斷言與團體,它不會吃(不會提前),如果你使用一組先行裏面你仍然得到你的匹配組。
<telerik:RadTab Text="RGB">
^ So when it comes here, it will match it with (?=<([^<>]+)>)
but it won't eat the whole matched string
Now obviously, the regex needs to continue to look for other matches
So it comes here:
<telerik:RadTab Text="RGB">
^ no match
<telerik:RadTab Text="RGB">
^no match
.....
until
<telerik:RadTab Text="RGB">
^hey there is a match using (?=("[^"]*"))
it will then advance further
<telerik:RadTab Text="RGB">
^no match
.... until it reaches the end
當然,如果你有一個字符串像<telerik:RadTab Text="RGB'lol'">
它仍然會匹配在雙引號值'lol'
並把它放在組號碼2。
Online demo
正則表達式的岩石!
也許你應該使用XML解析器。 –
你可以使用多個正則表達式 – Alan
你也許可以用反向查找來做到這一點,但爲什麼不嵌套它(測試引用的例子,如果沒有,檢查'<>')(你錯過了在上一次搜索中包含<>,所以它將匹配一行中的所有內容...) – beroe