Preg_match_all與嵌套匹配

我正在開發模板系統並遇到一些問題。Preg_match_all與嵌套匹配

該計劃是在其中創建帶有[@tags]的HTML文檔。我可以只使用str_replace函數（我可以循環槽全部更換更多鈔票），但我想按這個遠一點;-)

我想允許嵌套的標籤，並允許參數與每個標籤：

[@title|You are looking at article [@articlenumber] [@articlename]]

我想獲得與preg_match_all結果如下：

[0] title|You are looking at article [@articlenumber] [@articlename] 
[1] articlenumber 
[2] articlename

我的腳本將拆分|參數。從我的腳本的輸出將是這樣的：

<div class='myTitle'>You are looking at article 001 MyProduct</div>

我遇到的問題是，我不是跟正則表達式exprerienced。我的paterns結果幾乎是我想要的，但有嵌套params問題。

\[@(.*?)\]

將從articlenumber停在。

\[@(.*?)(((?R)|.)*?)\]

是更喜歡它，但它沒有抓住articlenumber; https://regex101.com/r/UvH7zi/1

希望有人能幫助我！提前致謝！

來源

2017-10-09 Remi Romme

我相信是時候使用一個合適的html解析器，比如http://simplehtmldom.sourceforge.net/;）下面是關於pcre遞歸模式的總結，但是它會很快失去作用http：// www.rexegg.com/regex-recursion.html。 –

你不能用普通的Python正則表達式來做到這一點。您正在尋找類似於「balancing groups」的功能。 NET RegEx's engine，允許嵌套匹配。

在PyParsing允許嵌套表達請看：從pyparsing進口nestedExpr

import pyparsing as pp 
text = '{They {mean to {win}} Wimbledon}' 
print(pp.nestedExpr(opener='{', closer='}').parseString(text))

輸出是：

[['They', ['mean', 'to', ['win']], 'Wimbledon']]

不幸的是，這不符合你的例子很好地工作。我想，你需要更好的語法。

您可以嘗試QuotedString定義，但仍然可以。

import pyparsing as pp 
single_value = pp.QuotedString(quoteChar="'", endQuoteChar="'") 
parser = pp.nestedExpr(opener="[", closer="]", 
         content=single_value, 
         ignoreExpr=None) 

example = "['@title|You are looking at article' ['@articlenumber'] ['@articlename']]" 
print(parser.parseString(example, parseAll=True))

來源

2017-10-09 08:12:04 wp78de

使用您的原始模式，我能找到的最接近您想要的輸出是： '\ [@（。*？）（\ b（（？R）|。* *）* \]' – wp78de

wp78de：this是最接近我的發言。問題是在標題內嵌入另一個標籤時，找不到它，因爲參數數量不是動態的。但你的awnser非常接近我所需要的 –

而我很抱歉沒有提到我的程序語言，我使用PHP。現在我已經裝箱解析器： ' - 讓所有打開的標籤，並把它們strpos陣 - 環槽都開始開放標籤的位置 - 尋找下一個closingtag，是之前的下一個開放-標籤？比標籤完整 - 如果closingtag在開始標籤之後，跳過那一個並尋找下一個（並繼續檢查其間的開始標籤）' 這樣我就可以找到所有完整的標籤並替換它們。但是，這花了大約50行代碼和多個循環，所以一個preg_match會更大;-) –

這裏是我的代碼：

@\w+\|[\w\s]+\[@(\w+)]\s+\[@(\w+)]

https://regex101.com/r/UvH7zi/3

來源

2017-10-09 09:04:48 minhung

現在我已經裝箱解析器：

- get all opening tags, and put their strpos in array - loop trough all start positions of the opening tags - Look for the next closingtag, is it before the next open-tag? than the tag is complete - If the closingtag was after an opening tag, skip that one and look for the next (and keep checking for openingtags in between)

這樣，我能找到的所有完整的標籤並替換它們。但是，這花了大約50行代碼和多個循環，所以一個preg_match會更大;-)

來源

2017-10-09 12:46:27

我在我的手機上輸入這個，所以可能會有一些錯誤，但是你想要的東西可以很容易地實現通過將先行進入你的表達：

(?=\\[(@(?:\\[(?1)\\]|.)*)\\])

編輯：是的，它的工作原理，在這裏你去：https://regex101.com/r/UvH7zi/4

由於（？=）不消耗字符，圖案看起來並捕獲所有內容「[@ *]」主題中的子串，遞歸地檢查內容本身是否包含平衡組，如果有的話。

來源

2017-10-10 18:53:50 jaytea

Preg_match_all與嵌套匹配

回答

相關問題