2011-10-17 79 views
1

我一直試圖做一個正則表達式匹配和替換HTML的部分關鍵詞的出現:PHP正則表達式匹配關鍵字之外HTML標籤<a>

  1. 我想匹配keyword<strong>keyword</strong>
  2. <a href="someurl.html" target="_blank">keyword</a><a href="someur2.html">already linked keyword </a>應不匹配

我只是在一號線的匹配(和更換)的keyword感興趣。

我想要這個的原因是用<a href="dictionary.php?k=keyword">keyword</s>代替keyword,但只有在keyword它不在<a>標籤內。

任何幫助將不勝感激!

+1

我把這個打掃了一下,因爲格式很差,但我不確定我的更正是否完全正確... tixastronauta,如果我的「修復」引入了錯誤,請編輯並更正它們。那些直接取代所有出現的「關鍵字」的 – eykanal

回答

1

我能夠做到我想要的東西(不使用正則表達式)由:

  • 解析我的字符串
  • 刪除所有<a>標籤(將它們複製到一個臨時數組,並保持一個佔位符的每個字符在字符串上)
  • str_replace新字符串爲了替換所有關鍵字
  • 重新填充它原來的佔位符<a>個標籤

這是我使用的代碼,如果別人需要它:

$str = <<<STRA 
Moses supposes his toeses are roses, 
but <a href="original-moses1.html">Moses</a> supposes erroneously; 
for nobody's toeses are posies of roses, 
as Moses supposes his toeses to be. 
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>! 
STRA; 

$arr1 = str_split($str); 

$arr_links = array(); 
$phrase_holder = ''; 
$current_a = 0; 
$goto_arr_links = false; 
$close_a = false; 

foreach($arr1 as $k => $v) 
{ 
    if ($close_a == true) 
    { 
     if ($v == '>') { 
      $close_a = false; 
     } 
     continue; 
    } 

    if ($goto_arr_links == true) 
    { 
     $arr_links[$current_a] .= $v; 
    } 

    if ($v == '<' && $arr1[$k+1] == 'a') { /* <a */ 
     // keep collecting every char until </a> 
     $arr_links[$current_a] .= $v; 
     $goto_arr_links = true; 
    } elseif ($v == '<' && $arr1[$k+1] == '/' && $arr1[$k+2] == 'a' && $arr1[$k+3] == '>') { /* </a> */ 
     $arr_links[$current_a] .= "/a>"; 

     $goto_arr_links = false; 
     $close_a = true; 
     $phrase_holder .= "{%$current_a%}"; /* put a parameter holder on the phrase */ 
     $current_a++; 
    }  
    elseif ($goto_arr_links == false) { 
     $phrase_holder .= $v; 
    } 
} 

echo "Links Array:\n"; 
print_r($arr_links); 
echo "\n\n\nPhrase Holder:\n"; 
echo $phrase_holder; 
echo "\n\n\n(pre) Final Phrase (with my keyword replaced):\n"; 
$final_phrase = str_replace("Moses", "<a href=\"novo-mega-link.php\">Moses</a>", $phrase_holder); 
echo $final_phrase; 
echo "\n\n\nFinal Phrase:\n"; 
foreach($arr_links as $k => $v) 
{ 
    $final_phrase = str_replace("{%$k%}", $v, $final_phrase); 
} 
echo $final_phrase; 

輸出:

鏈接陣:

Array 
(
    [0] => <a href="original-moses1.html">Moses</a> 
    [1] => <a href="original-moses2.html" target="_blank">Moses</a> 
) 

短語持有人:

Moses supposes his toeses are roses, 
but {%0%} supposes erroneously; 
for nobody's toeses are posies of roses, 
as Moses supposes his toeses to be. 
Ganda <span class="cenas">{%1%}</span>! 

(預)最後一個短語(與我的關鍵字替換):

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses, 
but {%0%} supposes erroneously; 
for nobody's toeses are posies of roses, 
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be. 
Ganda <span class="cenas">{%1%}</span>! 

最後一個短語:

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses, 
but <a href="original-moses1.html">Moses</a> supposes erroneously; 
for nobody's toeses are posies of roses, 
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be. 
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>! 
0
$lines = explode("\n", $content); 
$lines[0] = stri_replace("keyword", "replacement", $lines[0]); 
$content = implode("\n", $lines); 

,或者如果你明確要使用正則表達式

$lines = explode("\n", $content); 
$lines[0] = preg_replace("/keyword/i", "replacement", $lines[0]); 
$content = implode("\n", $lines); 
+0

。我只想替換其中一些。不過感謝 – tixastronauta

-1

考慮使用HTML解析庫,而不是一個正則表達式,如simplehtmldom。您可以使用它來更新特定HTML標記的內容(因此,忽略不想更改的內容)。那麼你不必使用正則表達式;只要你過濾了適當的標籤,就可以使用像str_replace這樣的功能。

3
$str = preg_replace('~Moses(?!(?>[^<]*(?:<(?!/?a\b)[^<]*)*)</a>)~i', 
        '<a href="novo-mega-link.php">$0</a>', $str); 

負先行內的表達式匹配到下一個閉合</a>標籤,但前提是不是先看到開頭<a>標籤。如果成功,則表示單詞Moses位於錨點元素內,所以預覽失敗,並且不會發生匹配。

這是demo

+0

謝謝艾倫,但你的正則表達式也取代關鍵字「'moses'」的''標籤內.. 所以,你的樣品中:'但Moses假設錯誤;'就變成了:'但moses 1.HTML 「>Moses錯誤地提示;'我不希望發生這種情況 – tixastronauta

+0

對不起,'我正在嘗試在前瞻中使用'\ s'而不是'\ b',並且意外地把它留在 –

+0

太棒了!非常感謝! – tixastronauta

相關問題