2014-02-27 62 views
0

我想解析包含這樣的一些PHP文件:的preg_match一個PHP字符串簡單或雙引號逃脫內

// form 1 
__('some string'); 
// form 2 
__('an other string I\'ve written with a quote'); 
// form 3 
__('an other one 
multiline'); 
// form 4 
__("And I want to handle double quotes too !"); 
// form 5 
__("And I want to handle double quotes too !", $second_parameter_may_happens); 

下面的正則表達式匹配一切不同的是二次一個

preg_match_all('#__\((\'|")(.*)\1(?:,.*){0,1}\)#smU', $file_content); 
+1

不要使用正則表達式,至少不是一個應該做的一切立刻。即使你在實踐中使用它,但證明它總是有效的難度和需要做出改變的可維護性噩夢並不值得。 – Jon

回答

2

你可以使用這種模式:

$pattern = '~__\((["\'])(?<param1>(?>[^"\'\\\]+|\\\.|(?!\1)["\'])*)\1(?:,\s*(?<param2>\$[a-z0-9_-]+))?\);~si'; 

if (preg_match_all($pattern, $data, $matches, PREG_SET_ORDER)) 
    print_r($matches); 

但正如喬恩注意到的那樣,這種模式可能很難t保持。這就是爲什麼我建議改變這種模式的原因:

$pattern = <<<'LOD' 
~ 
## definitions 
(?(DEFINE) 
    (?<sqc>  # content between single quotes 
     (?> [^'\\]+ | \\.)* #' 
     # can be written in a more efficient way, with an unrolled pattern: 
     # [^'\\]*+ (?:\\. ['\\]*)*+ 
    ) 
    (?<dqc>  # content between double quotes 
     (?> [^"\\]+ | \\.)* #" 
    ) 
    (?<var>  # variable 
     \$ [a-zA-Z0-9_-]+ 
    ) 
) 

## main pattern 
__\(
(?| " (?<param1> \g<dqc>) " | ' (?<param1> \g<sqc>) ') 
# note that once you define a named group in the first branch in a branch reset 
# group, you don't have to include the name in other branches: 
# (?| " (?<param1> \g<dgc>) " | ' (\g<sqc>) ') does the same. Even if the 
# second branch succeeds, the capture group will be named as in the first branch. 
# Only the order of groups is taken in account. 
(?:, \s* (?<param2> \g<var>))? 
\); 
~xs 
LOD; 

這個簡單的改變使得你的模式更具可讀性和可編輯性。

之間的內容子模式已被設計來處理逃脫報價。這樣做是爲了匹配一個反斜槓字符所有(也可以是一個反斜槓本身),以確保以符合字面反斜線和轉義引號::

\'   # an escaped quote 
\\'  #'# an escaped backslash and a quote 
\\\'   # an escaped backslash and an escaped quote 
\\\\'  #'# two escaped backslashes and a quote 
... 

子模式的細節:

(?>   # open an atomic group (inside which the bactracking is forbiden) 
    [^'\\]+ #'# all that is not a quote or a backslash 
    |   # OR 
    \\.  # an escaped character 
)*    # repeat the group zero or more times 
+0

非常感謝你的迴應,但是你的「擴展版本」不起作用:/ – Asenar

+0

@Asenar:我已經測試了兩個版本,並且它們運行良好。我建議你寫:'ini_set('display_errors','On');'在PHP腳本的開頭找到錯誤。 –

+0

mea culpa!它正在工作,但我仍然使用$ match [2]而不是$ match ['param1'];謝謝 ! – Asenar

0

我終於找到了解決方案基於我的第一表達,所以我會寫,但用卡西米爾的擴展風格,誰做一個真正偉大的答案

$pattern = <<<'LOD' 
# 
    __\(
    (?<quote>'|") # catch the opening quote 
    (?<param1> 
     (?: 
     [^'"]  # anything but quoteS 
     | 
     \\'   # escaped single quote are ok 
     | 
     \\"   # escaped double quote are ok too 
    )* 
    ) 
    \k{quote}    # find the closing quote 
    (?:,.*){0,1}   # catch any type of 2nd parameter 
    \) 
#smUx    # x to allow comments :) 
LOD;