如何使用正則表達式評估約束？（PHP，正則表達式）

所以，讓我們說，我想接受字符串如下
SomeColumn IN||<||>||= [123, 'hello', "wassup"]||123||'hello'||"yay!"
例如：MyValue IN ['value', 123]或MyInt > 123 - >我覺得你的想法。現在，有什麼困擾我的是如何用正則表達式來表達這一點？我正在使用PHP，這就是我現在正在做的事情：
如何使用正則表達式評估約束？（PHP，正則表達式）

  $temp = explode(';', $constraints); 
     $matches = array(); 
     foreach ($temp as $condition) { 
      preg_match('/(.+)[\t| ]+(IN|<|=|>|!)[\t| ]+([0-9]+|[.+]|.+)/', $condition, $matches[]); 
     } 
     foreach ($matches as $match) { 
      if ($match[2] == 'IN') { 
       preg_match('/(?:([0-9]+|".+"|\'.+\'))/', substr($match[3], 1, -1), $tempm); 
       print_r($tempm); 
      } 
     }

真的很感謝任何幫助，我的regex'ing是可怕的。

來源

2012-11-13 Fabian Schneider

我假設你輸入類似於此：

$string = 'SomeColumn IN [123, \'hello\', "wassup"];SomeColumn < 123;SomeColumn = \'hello\';SomeColumn > 123;SomeColumn = "yay!";SomeColumn = [123, \'hello\', "wassup"]';

如果使用preg_match_all沒有必要explode或建立自己的比賽。請注意，生成的二維數組將切換其尺寸，但這通常是可取的。下面是代碼：

preg_match_all('/(\w+)[\t ]+(IN|<|>|=|!)[\t ]+((\'[^\']*\'|"[^"]*"|\d+)|\[[\t ]*(?4)(?:[\t ]*,[\t ]*(?4))*[\t ]*\])/', $string, $matches); 

$statements = $matches[0]; 
$columns = $matches[1]; 
$operators = $matches[2]; 
$values = $matches[3];

也將有一個$matches[4]但它並沒有真正的含義，只用在正則表達式中。首先，你在嘗試中做了一些錯誤的事情：

(.+)會盡可能消耗任何字符。所以如果你有一個看起來像IN 13的字符串值中的東西，那麼你的第一個重複可能會消耗所有東西，直到那裏，並將它作爲列返回。它也允許空格和列名內的=。有兩種方法可以解決這個問題。要麼通過附加?或者更好地限制允許的字符來使得重複「非理性」，所以你不能越過期望的分隔符。在我的正則表達式中，我只允許字母，數字和下劃線（\w）作爲列標識符。
[\t| ]這混合了兩個概念：交替和字符類。它所做的是「匹配標籤，管道或空間」。在字符類中，您只需編寫所有字符而不用分隔它們。或者你可以寫(\t|)這在這種情況下是等價的。
[.+]我不知道你試圖用這個做什麼，但它匹配一個字面的.或一個文字+。並再次它可能是限制允許的字符，並檢查報價的正確匹配（避免'some string"）

現在對於我自己的正則表達式的解釋（你可以把它複製到你的代碼，也有用，它會工作得很好，再加上你有解釋的評論在您的代碼）：

preg_match_all('/ 
    (\w+)   # match an identifier and capture in $1 
    [\t ]+   # one or more tabs or spaces 
    (IN|<|>|=|!) # the operator (capture in $2) 
    [\t ]+   # one or more tabs or spaces 
    (    # start of capturing group $3 (the value) 
     (   # start of subpattern for single-valued literals (capturing group $4) 
      \'  # literal quote 
      [^\']* # arbitrarily many non-quote characters, to avoid going past the end of the string 
      \'  # literal quote 
     |   # OR 
      "[^"]*" # equivalent for double-quotes 
     |   # OR 
      \d+  # a number 
     )   # end of subpattern for single-valued literals 
    |    # OR (arrays follow) 
     \[   # literal [ 
     [\t ]*  # zero or more tabs or spaces 
     (?4)  # reuse subpattern no. 4 (any single-valued literal) 
     (?:   # start non-capturing subpattern for further array elements 
      [\t ]* # zero or more tabs or spaces 
      ,  # a literal comma 
      [\t ]* # zero or more tabs or spaces 
      (?4) # reuse subpattern no. 4 (any single-valued literal) 
     )*   # end of additional array element; repeat zero or more times 
     [\t ]*  # zero or more tabs or spaces 
     \]   # literal ] 
    )    # end of capturing group $3 
    /', 
    $string, 
    $matches);

這使得使用PCRE的遞歸功能，您可以與(?n)重用子模式（或整個正則表達式）（其中n只是您將用於反向引用的數字）。

我能想到的三個主要的東西，可以用這個表達式進行改進：

它不允許浮點數
它不允許轉義引號（如果你的價值是'don\'t do this' ，我只會捕獲'don\'）。這可以使用negative lookbehind來解決。
它不允許空數組作爲值

我包括沒有這些（這可以通過在一個子模式包裝的所有參數，並使它可選的?容易解決），因爲我不知道是否他們適用於你的問題，我認爲這個正則表達式已經足夠複雜了，可以在這裏展示。

通常，正則表達式不夠強大，無論如何都無法正確進行語言分析。編寫解析器通常會更好。

既然你說過你的regex'ing是可怕的......而正則表達式由於他們不常見的語法看起來像很多黑魔法，他們並不難理解，如果你花一點時間去獲取你的回顧他們的基本概念。我可以推薦this tutorial。它真的需要你一路通過！

來源

2012-11-13 22:11:15

如何使用正則表達式評估約束？ （PHP，正則表達式）

回答

相關問題

如何使用正則表達式評估約束？（PHP，正則表達式）