正則表達式匹配與字邊界太鬆散

我有以下代碼，我試圖匹配特定的單詞準確地使用單詞邊界，將它們替換爲「審查」，然後重建文本，但由於某種原因正則表達式正在捕獲尾部斜線。爲了清晰起見，我將其簡化爲以下測試案例正則表達式匹配與字邊界太鬆散

<?php 

$words = array('bad' => "censored"); 
$text = "bad bading testbadtest badder"; 
$newtext = ""; 

foreach(preg_split("/(\[\/?(?:acronym|background|\*)(?:=.+?)?\]|(^|\W)bad(\W|$))/i", $text, null, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY) as $section) 
{ 
    if (isset($words[ $section ]) ) 
    { 
     $newtext .= $words[ $section ]; 
    } 
    else 
    { 
     $newtext .= $section ; 
    } 
} 

var_dump($newtext);

exit;

在這個例子中，我期望匹配「壞」，但不會綁定testbadtest或badder。問題是「不好」（注意尾部空格）正在匹配，它不在$ words數組中作爲鍵存在。

有人請解釋我可能會出錯的地方嗎？

在此先感謝

來源

2013-10-25 MopeyGecko

'bad（\ W | $）'表示 - 「bad」後跟任何非單詞字符（或字符串末尾），這是一個空格。你需要的是斷言，比如'bad（？= \ W）'，或'bad \ b'。 http://us2.php.net/manual/en/regexp.reference.assertions.php – zerkms

爲什麼你使用'preg_split'這個？ – Steven

另外，你的'$ words'數組中有空格處理'bad'？如果問題是一個空間..你有沒有想過在使用[trim]（http://php.net/trim）之前嘗試匹配？ – Steven

我想我會採取不同的方法，因爲我不知道爲什麼你正在使用preg_split()進行硬編碼的審查詞語的正則表達式。

只需構建一組要替換的圖案陣列及其替代品並使用preg_replace()即可。

// note no space in words or their replacements 
$word_replacement_map = array(
    'bad' => 'b*d', 
    'alsobad' => 'a*****d' 
); 
$bad_words = array_keys($word_replacement_map); 
$patterns = array_map(function($item) { 
    return '/\b' . preg_quote($item) . '\b/u'; 
}, $bad_words); 
$replacements = array_values($replacement_map); 
$input_string = 'the string with bad and alsobad words'; 
$cleaned_string = preg_replace($patterns, $replacements, $input_string); 
var_dump($cleaned_string); // the string with b*d and a*****d words

注意，如果你並不需要特定的詞替換，你可以只是這歸因於：

// note no space in words 
$bad_words = array(
    'bad', 
    'alsobad' 
); 
$replacement = 'censored'; 
$patterns = array_map(function($item) { 
    return '/\b' . preg_quote($item) . '\b/u'; 
}, $bad_words); 
$input_string = 'the string with bad and alsobad words'; 
$cleaned_string = preg_replace($patterns, $replacement, $input_string); 
var_dump($cleaned_string); // the string with censored and censored words

注意這裏我在正則表達式模式使用字邊界，它一般應滿足您的需求。

來源

2013-10-25 23:46:20

正則表達式匹配與字邊界太鬆散

回答

相關問題