解析標記成抽象語法樹的使用正則表達式

這個問題補充：Recursive processing of markup using Regular Expression and DOMDocument 解析標記成抽象語法樹的使用正則表達式

所選擇的答案提供的代碼已經有很大的幫助，瞭解建立一個基本的語法樹。不過，我現在遇到了收緊正則表達式的麻煩，只能匹配我的語法，而不是{.而不是{{。理想情況下，我想它只是符合我的語法是：

{<anchor>} 
{!image!} 
{*strong*} 
{/emphasis/} 
{|code|} 
{-strikethrough-} 
{>small<}

兩個標籤，a和small也需要不同的結束標記。我已經嘗試從原始代碼示例中修改$re_closetag以反映這一點，但它仍然與文本匹配太多。

例如：

http://www.google.com/>} bang 
smäll<} boom

我的測試字符串是：

tëstïng {{ 漢字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3

來源

2013-04-10 esryl

您可以在RE本身或比賽結束後進行控制。

在重新，控制標籤可能是什麼「打開」修改$re_next這一部分：

(?:\{(?P<opentag>[^{\s])) # match an open tag 
     #which is "{" followed by anything other than whitespace or another "{"

目前它看起來是不是{或空白的任何字符。只需更改爲：

(?:\{(?P<opentag>[<!*/|>-]))

現在，它只查找您的特定打開標籤。

關閉標記部分一次只能匹配一個字符，具體取決於當前上下文中打開的標記。（這是$opentag的參數。）因此，要匹配一對字符，只需更改$opentag即可在遞歸調用中查找。例如：

 if (isset($m['opentag']) && $m['opentag'][1] !== -1) { 
      list($newopen, $_) = $m['opentag']; 

      // change the close character to look for in the new context 
      if ($newopen==='>') $newopen = '<'; 
      else if ($newopen==='<') $newopen = '>'; 

      list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen); 
      $ast[] = array($newopen, $subast); 
     } else if (isset($m['text']) && $m['text'][1] !== -1) {

或者，您可以保持原樣，並決定如何處理事實後的匹配。例如，如果您匹配@字符，但{@不是允許的開放標記，則可以提出解析錯誤或將其簡單地視爲文本節點（向ast添加array('#text', '{@')）或其中任何內容。

來源

2013-04-10 20:01:44

解析標記成抽象語法樹的使用正則表達式

回答

相關問題