PHP正則表達式：匹配特定的詞中的HTML

我有這樣的HTML代碼：PHP正則表達式：匹配特定的詞中的HTML

<html> 
<div class="the_grp"> 
<h3>heading <span id="sn-sin" class="the_decs">(keyword: <i>cat</i>)</span></h3> 
<ul> 
    <li> 
     <div> 
      <div><span class="w_pos"></span></div> 
      <div class="w_the"> 
      <a href="http://www.exampledomain.com/20111/cute-cat">cute cat</a>, 
      <a href="http://www.exampledomain.com/7456/catty">catty</a>, 
      </div> 
     </div> 
    </li> 
    <li> 
     <div> 
      <div><span class="w_pos"></span></div> 
      <div class="w_the"> 
      <a href="http://www.exampledomain.com/7589/sweet">sweet</a>, 
      <a href="http://www.exampledomain.com/10852/sweet-cat">sweet cat</a>, 
      <a href="http://www.exampledomain.com/20114/cat-vs-dog">cat vs dog</a>, 
     </div> 
    </li> 
</ul> 
</div> 

<a id="ant"></a> 
<div class="the_grp"> 
<h3>another heading <span id="sn-an" class="the_decs">(ignore this: <i>cat</i>)</span></h3> 
<ul> 
    <li> 
     <div> 
      <div><span class="w_pos"></span></div> 
      <div class="w_the"><a href="http://www.exampledomain.com/118/bad-cat">bad cat</a></div> 
     </div> 
    </li> 
</ul> 
</div>

我要匹配html代碼下面的話：

可愛的貓
每斤
sweet
甜貓
貓vs狗

我使用這個模式，捕捉[2]獲得的那些話：

#<a href="http\:(.*?)">(.*?)<\/a>#i

我的PHP代碼是這樣的：

preg_match_all('#<a href="http\:(.*?)">(.*?)<\/a>#i', $data, $matches); 
echo '<pre>'; 
print_r($matches[2]); 
echo '</pre>';

這種模式匹配「壞貓」太。如何只捕捉下面這些詞：可愛的貓，每斤，甜，貓，貓vs狗？

在此先感謝。

來源

2017-03-15 danul

我將把[此帖]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except- xhtml-self-contained-tags） – ChrisG

不要使用正則表達式來解析HTML。 – Vallentin

您使用的模式將匹配'a'中的所有內容。你試圖做的事情就是拼湊，爲此尋找一個PHP庫。 – MikeVelazco

最好只使用HTML解析器。以下是你如何使用http://simplehtmldom.sourceforge.net/來做到這一點。

file_get_html將是最好，它會調用基本的file_get_contents和str_get_html，

str_get_html是你如何解析字符串爲一個簡單的HTML DOM對象。

<?php 

require('simple_html_dom.php'); 

$html = str_get_html(/*your html here*/); 

foreach($html->find('a') as $element) 
     echo $element->plaintext . '<br>'; 

?>

如果你不想讓壞貓匹配，只需循環遍歷結果並刪除/忽略它。

如果你想刪除bad cat：

foreach($html->find('a') as $element) 
    if ($element->plaintext != "bad cat") 
     echo $element->plaintext . '<br>';

來源

2017-03-15 19:21:24 Neil

PHP正則表達式：匹配特定的詞中的HTML

回答

相關問題