你可以使用這種模式:
$pattern = '~__\((["\'])(?<param1>(?>[^"\'\\\]+|\\\.|(?!\1)["\'])*)\1(?:,\s*(?<param2>\$[a-z0-9_-]+))?\);~si';
if (preg_match_all($pattern, $data, $matches, PREG_SET_ORDER))
print_r($matches);
但正如喬恩注意到的那樣,這種模式可能很難t保持。這就是爲什麼我建議改變這種模式的原因:
$pattern = <<<'LOD'
~
## definitions
(?(DEFINE)
(?<sqc> # content between single quotes
(?> [^'\\]+ | \\.)* #'
# can be written in a more efficient way, with an unrolled pattern:
# [^'\\]*+ (?:\\. ['\\]*)*+
)
(?<dqc> # content between double quotes
(?> [^"\\]+ | \\.)* #"
)
(?<var> # variable
\$ [a-zA-Z0-9_-]+
)
)
## main pattern
__\(
(?| " (?<param1> \g<dqc>) " | ' (?<param1> \g<sqc>) ')
# note that once you define a named group in the first branch in a branch reset
# group, you don't have to include the name in other branches:
# (?| " (?<param1> \g<dgc>) " | ' (\g<sqc>) ') does the same. Even if the
# second branch succeeds, the capture group will be named as in the first branch.
# Only the order of groups is taken in account.
(?:, \s* (?<param2> \g<var>))?
\);
~xs
LOD;
這個簡單的改變使得你的模式更具可讀性和可編輯性。
之間的內容子模式已被設計來處理逃脫報價。這樣做是爲了匹配一個反斜槓字符所有(也可以是一個反斜槓本身),以確保以符合字面反斜線和轉義引號::
\' # an escaped quote
\\' #'# an escaped backslash and a quote
\\\' # an escaped backslash and an escaped quote
\\\\' #'# two escaped backslashes and a quote
...
子模式的細節:
(?> # open an atomic group (inside which the bactracking is forbiden)
[^'\\]+ #'# all that is not a quote or a backslash
| # OR
\\. # an escaped character
)* # repeat the group zero or more times
不要使用正則表達式,至少不是一個應該做的一切立刻。即使你在實踐中使用它,但證明它總是有效的難度和需要做出改變的可維護性噩夢並不值得。 – Jon