2016-05-14 58 views
0

我想使用Python正則表達式解析PHP文件中的定義語句。 (或者換句話說:我想使用Python來解析PHP文件)。使用Python正則表達式解析PHP定義語句

我想什麼來解析是這樣定義語句:

define("My_KEY", "My_Value l"); 
define('My_KEY', 'My_Value'); 
define( 'My_KEY' , "My_Value" ); 

於是我想出了以下Python的正則表達式:

define\(\s*["']{1}(.[^'"]*)["']{1}\s*,\s*["']{1}(.[^'"]*)["']{1}\s*\) 

這個偉大的工程,只要有是定義語句中沒有用的"'。例如,這樣的事情是行不通的:

define( 'My_KEY' , 'My\'_\'Value' ); 
define( 'My_KEY' , "My'_'Value" ); 

任何想法如何解決這個問題?

+1

是整個任務所需的正則表達式?你可以使用正則表達式來找到'define(..)',然後在parens之間分割字符串,並修剪它,等等,以得到你需要的值。 –

+0

請參閱http:// stackoverflow。com/questions/1352693/how-to-match-a-quoted-string-with-escaped-quotes-in-it – Barmar

+1

@AndyG是的,我可以,但我想了解更多關於如何使用正則表達式,以便爲什麼我提出了這個問題。 – manuel

回答

0

在蟒蛇,

str="define( 'My_KEY' , 'My\'_\'Value' )"; 
import re 
re.sub(r"""^define\(\s*['"]*(.*?)['"]*[\s,]+['"]*(.*?)['"]*\s*\)""",r'\2 ; \1', str) 

輸出:

"My'_'Value ; My_KEY" 
1

您可以使用類似:

import re 
result = re.findall(r"""^define\(\s*['"]*(.*?)['"]*[\s,]+['"]*(.*?)['"]*\s*\)""", subject, re.IGNORECASE | re.DOTALL | re.MULTILINE) 

Regex101 Demo and Explanation


匹配:

MATCH 1 
1. [8-14] `My_KEY` 
2. [18-28] `My_Value l` 
MATCH 2 
1. [40-46] `My_KEY` 
2. [50-58] `My_Value` 
MATCH 3 
1. [73-79] `My_KEY` 
2. [88-96] `My_Value` 
MATCH 4 
1. [114-120] `My_KEY` 
2. [129-141] `My\'_\'Value` 
MATCH 5 
1. [159-165] `My_KEY` 
2. [174-184] `My'_'Value` 
+0

這看起來不錯。但它是如何工作的?爲什麼它在達到「or」時不停止? – manuel

+0

我不知道,你在哪裏提到'或'在你的問題中? –

1

使用變通一下這個怪物正則表達式:

define\(\s*(["'])(?P<key>.+?(?=\1))\1\s*, 
\s*(["'])(?P<value>.+?)(?=\3)(?<!\\)\3 

a demo on regex101.com

0

說明

此正則表達式將執行以下操作:

  • 匹配與define開始,並建立內部鍵和值括號
  • 捕獲鍵和值的字符串,不包含所有行包裝報價
  • 將所有鍵和值包裹在單引號或雙引號中
  • 正確處理轉義引號
  • 難以避免邊緣的情況下,如:
    • define('file path', "C:\\windows\\temp\\");其中一個轉義斜槓收盤報價

正則表達式

注意之前存在:使用以下標誌:情況 - 敏感,全球,多行

^define\(\s*(['"])((?:\\\1|(?:(?!\1).))*)\1\s*,\s*(['"])((?:\\\3|(?:(?!\3).))*)\3\s*\); 

Regular expression visualization

捕獲基團

  • 捕獲組0獲取整個字符串
  • 捕獲組1得到圍繞key
  • 捕獲組2報價類型獲得內部的key串報價
  • 捕獲組3獲取圍繞th的報價類型Ëvalue
  • 捕獲組4得到value字符串引號內

例子

現場演示

https://regex101.com/r/oP4sV0/1

示例文字

define("0 My_KEY", "0 My_Value l"); 
define('1 My_KEY', '1 My_Value'); 
define( '2 My_KEY' , "2 My_Value" ); 
define( '3 My_KEY\\' , '3 My\'_\'Value' ); 
define( '4 My_KEY' , "4 My'_'Value\\" ); 

樣品匹配

[0][0] = define("0 My_KEY", "0 My_Value l"); 
[0][1] = " 
[0][2] = 0 My_KEY 
[0][3] = " 
[0][4] = 0 My_Value l 

[1][0] = define('1 My_KEY', '1 My_Value'); 
[1][1] = ' 
[1][2] = 1 My_KEY 
[1][3] = ' 
[1][4] = 1 My_Value 

[2][0] = define( '2 My_KEY' , "2 My_Value" ); 
[2][1] = ' 
[2][2] = 2 My_KEY 
[2][3] = " 
[2][4] = 2 My_Value 

[3][0] = define( '3 My_KEY' , '3 My\'_\'Value' ); 
[3][1] = ' 
[3][2] = 3 My_KEY\\ 
[3][3] = ' 
[3][4] = 3 My\'_\'Value 

[4][0] = define( '4 My_KEY' , "4 My'_'Value" ); 
[4][1] = ' 
[4][2] = 4 My_KEY 
[4][3] = " 
[4][4] = 4 My'_'Value\\ 

說明

NODE      EXPLANATION 
---------------------------------------------------------------------- 
^      the beginning of a "line" 
---------------------------------------------------------------------- 
    define     'define' 
---------------------------------------------------------------------- 
    \(      '(' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    ['"]      any character of: ''', '"' 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     \\      '\' 
---------------------------------------------------------------------- 
     \1      what was matched by capture \1 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     (?:      group, but do not capture: 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
      \1      what was matched by capture \1 
---------------------------------------------------------------------- 
     )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )      end of grouping 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \1      what was matched by capture \1 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    ,      ',' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    ['"]      any character of: ''', '"' 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    (      group and capture to \4: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     \\      '\' 
---------------------------------------------------------------------- 
     \3      what was matched by capture \3 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     (?:      group, but do not capture: 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
      \3      what was matched by capture \3 
---------------------------------------------------------------------- 
     )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )      end of grouping 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \4 
---------------------------------------------------------------------- 
    \3      what was matched by capture \3 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \)      ')' 
---------------------------------------------------------------------- 
    ;      ';'