正則表達式模式以去除括號（和內部的任何括號內）

輸入是一個Wikipedia頁面的第一個段落。我想刪除括號和括號之間的任何內容。正則表達式模式以去除括號（和內部的任何括號內）

然而，有時（通常），HTML內容括號內本身含有一個或數個括號，一般在一個鏈路的href=""。

採取以下：

<p> 
    The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>. 
</p>

我想最終的結果是：

<p> 
    The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>. 
</p>

但是當我使用下面的preg_replace模式它不工作，成爲它就會迷茫圓括號內的括號。

public function removeParentheses($content) { 

    $pattern = '@\(.*?\)@'; 
    $content = preg_replace($pattern, '', $content); 
    $content = str_replace(' .', '.', $content); 
    $content = str_replace(' ', ' ', $content); 
    return $content; 
}

其次，我怎麼能離開內部鏈接href=""和title=""括號？這些，如果不在文本括號內，則很重要。

來源

2017-10-18 Lazhar

正則表達式不能處理遞歸。如果你有一些遞歸模式（括號內括號..）你需要更多的邏輯 - 即寫一個解析器 – Philipp

不要用正則表達式解析HTML。正如@Philipp所說，它無法有效地做到這一點（當然，你可以一起使用一個可行的版本，但我保證你可以通過HTML中的一些不明確的東西來打破它）。使用像[SimpleXML的]一個XML解析器（http://php.net/manual/en/simplexml.examples.php） – ctwheels

你可能要參考https://stackoverflow.com/questions/3577641/how-do-you -parse和工藝-HTML-XML功能於PHP的工具列表，如果試圖解析HTML用PHP – Jeff

可以代替所有的佔位符的鏈接，然後刪除所有括號，並在年底替換佔位符回到其原始值。

這與preg_replace_callback()完成，傳遞一個事件計數器和更換陣列保持聯繫的軌道，然後使用自己的removeParentheses()擺脫括號，最後用str_replace()與array_keys()和array_values()讓你回鏈接：

<?php 
$string = '<p> 
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>. 
</p>'; 
$occurrences = 0; 
$replacements = []; 
$replacedString = preg_replace_callback("/<a .*?>.*?<\/a>/i", function($el) use (&$occurrences, &$replacements) { 
    $replacements["|||".$occurrences] = $el[0]; // the ||| are just to avoid unwanted matches 
    return "|||".$occurrences++; 
}, $string); 
function removeParentheses($content) { 
    $pattern = '@\(.*?\)@'; 
    $content = preg_replace($pattern, '', $content); 
    $content = str_replace(' .', '.', $content); 
    $content = str_replace(' ', ' ', $content); 
    return $content; 
} 
$replacedString = removeParentheses($replacedString); 
$replacedString = str_replace(array_keys($replacements), array_values($replacements), $replacedString); // get your links back 
echo $replacedString;

Demo

結果

然而

<p> 
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>. 
</p>

這一點在我看來脆。正如別人在評論中告訴你的，你shouldn't parse HTML with regular expressions。 A lot可以改變，你可以得到意想不到的結果。這可能會讓你朝正確的方向。

編輯關於圓括號內的圓括號，您可以使用遞歸模式。看看this great answer by Bart Kiers：

function removeParentheses($content) { 
    $pattern = '@\(([^()]|(?R))*\)@'; 
    $content = preg_replace($pattern, '', $content); 
    $content = str_replace(' .', '.', $content); 
    $content = str_replace(' ', ' ', $content); 
    return $content; 
}

Demo

來源

2017-10-18 16:30:39 ishegg

爲用戶請求這並不括號內處理括號的問題使用。只是鏈接中括號的問題。 https：// 3v4l。org/VDebj – Jeff

@Jeff謝謝。它現在。 – ishegg

正則表達式模式以去除括號（和內部的任何括號內）

回答

相關問題