我想使用Python正則表達式解析PHP文件中的定義語句。（或者換句話說：我想使用Python來解析PHP文件）。使用Python正則表達式解析PHP定義語句

我想什麼來解析是這樣定義語句：

define("My_KEY", "My_Value l"); 
define('My_KEY', 'My_Value'); 
define( 'My_KEY' , "My_Value" );

於是我想出了以下Python的正則表達式：

define\(\s*["']{1}(.[^'"]*)["']{1}\s*,\s*["']{1}(.[^'"]*)["']{1}\s*\)

這個偉大的工程，只要有是定義語句中沒有用的"或'。例如，這樣的事情是行不通的：

define( 'My_KEY' , 'My\'_\'Value' ); 
define( 'My_KEY' , "My'_'Value" );

任何想法如何解決這個問題？

來源

2016-05-14 manuel

是整個任務所需的正則表達式？你可以使用正則表達式來找到'define（..）'，然後在parens之間分割字符串，並修剪它，等等，以得到你需要的值。 –

請參閱http：// stackoverflow。com/questions/1352693/how-to-match-a-quoted-string-with-escaped-quotes-in-it – Barmar

@AndyG是的，我可以，但我想了解更多關於如何使用正則表達式，以便爲什麼我提出了這個問題。 – manuel

在蟒蛇，

str="define( 'My_KEY' , 'My\'_\'Value' )"; 
import re 
re.sub(r"""^define\(\s*['"]*(.*?)['"]*[\s,]+['"]*(.*?)['"]*\s*\)""",r'\2 ; \1', str)

輸出：

"My'_'Value ; My_KEY"

來源

2016-05-14 13:20:26

您可以使用類似：

import re 
result = re.findall(r"""^define\(\s*['"]*(.*?)['"]*[\s,]+['"]*(.*?)['"]*\s*\)""", subject, re.IGNORECASE | re.DOTALL | re.MULTILINE)

Regex101 Demo and Explanation

個

匹配：

MATCH 1 
1. [8-14] `My_KEY` 
2. [18-28] `My_Value l` 
MATCH 2 
1. [40-46] `My_KEY` 
2. [50-58] `My_Value` 
MATCH 3 
1. [73-79] `My_KEY` 
2. [88-96] `My_Value` 
MATCH 4 
1. [114-120] `My_KEY` 
2. [129-141] `My\'_\'Value` 
MATCH 5 
1. [159-165] `My_KEY` 
2. [174-184] `My'_'Value`

來源

2016-05-14 13:33:45

這看起來不錯。但它是如何工作的？爲什麼它在達到「or」時不停止？ – manuel

我不知道，你在哪裏提到'或'在你的問題中？ –

使用變通一下這個怪物正則表達式：

define\(\s*(["'])(?P<key>.+?(?=\1))\1\s*, 
\s*(["'])(?P<value>.+?)(?=\3)(?<!\\)\3

見a demo on regex101.com。

來源

2016-05-14 13:50:05 Jan

說明

此正則表達式將執行以下操作：

匹配與define開始，並建立內部鍵和值括號
捕獲鍵和值的字符串，不包含所有行包裝報價
將所有鍵和值包裹在單引號或雙引號中
正確處理轉義引號
難以避免邊緣的情況下，如：
- define('file path', "C:\\windows\\temp\\");其中一個轉義斜槓收盤報價

正則表達式

注意之前存在：使用以下標誌：情況 - 敏感，全球，多行

^define\(\s*(['"])((?:\\\1|(?:(?!\1).))*)\1\s*,\s*(['"])((?:\\\3|(?:(?!\3).))*)\3\s*\);

個

Regular expression visualization

捕獲基團

捕獲組0獲取整個字符串
捕獲組1得到圍繞key
捕獲組2報價類型獲得內部的key串報價
捕獲組3獲取圍繞th的報價類型Ëvalue
捕獲組4得到value字符串引號內

例子

現場演示

https://regex101.com/r/oP4sV0/1

示例文字

define("0 My_KEY", "0 My_Value l"); 
define('1 My_KEY', '1 My_Value'); 
define( '2 My_KEY' , "2 My_Value" ); 
define( '3 My_KEY\\' , '3 My\'_\'Value' ); 
define( '4 My_KEY' , "4 My'_'Value\\" );

樣品匹配

[0][0] = define("0 My_KEY", "0 My_Value l"); 
[0][1] = " 
[0][2] = 0 My_KEY 
[0][3] = " 
[0][4] = 0 My_Value l 

[1][0] = define('1 My_KEY', '1 My_Value'); 
[1][1] = ' 
[1][2] = 1 My_KEY 
[1][3] = ' 
[1][4] = 1 My_Value 

[2][0] = define( '2 My_KEY' , "2 My_Value" ); 
[2][1] = ' 
[2][2] = 2 My_KEY 
[2][3] = " 
[2][4] = 2 My_Value 

[3][0] = define( '3 My_KEY' , '3 My\'_\'Value' ); 
[3][1] = ' 
[3][2] = 3 My_KEY\\ 
[3][3] = ' 
[3][4] = 3 My\'_\'Value 

[4][0] = define( '4 My_KEY' , "4 My'_'Value" ); 
[4][1] = ' 
[4][2] = 4 My_KEY 
[4][3] = " 
[4][4] = 4 My'_'Value\\

說明

NODE      EXPLANATION 
---------------------------------------------------------------------- 
^      the beginning of a "line" 
---------------------------------------------------------------------- 
    define     'define' 
---------------------------------------------------------------------- 
    \(      '(' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    ['"]      any character of: ''', '"' 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     \\      '\' 
---------------------------------------------------------------------- 
     \1      what was matched by capture \1 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     (?:      group, but do not capture: 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
      \1      what was matched by capture \1 
---------------------------------------------------------------------- 
     )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )      end of grouping 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \1      what was matched by capture \1 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    ,      ',' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    ['"]      any character of: ''', '"' 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    (      group and capture to \4: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     \\      '\' 
---------------------------------------------------------------------- 
     \3      what was matched by capture \3 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     (?:      group, but do not capture: 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
      \3      what was matched by capture \3 
---------------------------------------------------------------------- 
     )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )      end of grouping 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \4 
---------------------------------------------------------------------- 
    \3      what was matched by capture \3 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \)      ')' 
---------------------------------------------------------------------- 
    ;      ';'

來源

2016-05-14 13:54:51

使用Python正則表達式解析PHP定義語句

回答

說明

例子

說明

相關問題