與其使用RegEx
在html中查找合適的標籤,使用DOMDocument
& DOMXPath
如下所示相當容易。
最後一行只是將最終編輯後的html回顯到textarea中,但您可以輕鬆將它保存到文件中。
/* XPath expression to find all anchors that do not contain "#" */
$query='//a[ not (contains(@href, "#")) ]';
/* Some url */
$url='http://stackoverflow.com/questions/39737604/keeping-anchor-tags-and-removing-other-hyperlinks-php-regex';
/* get the data */
$html=file_get_contents($url);
/* construct DOMDocument & DOMXPath objects */
$dom=new DOMDocument;
$dom->loadHTML($html);
$xp=new DOMXPath($dom);
/* Run the query */
$col=$xp->query($query);
/* Process all found nodes */
if(!empty($col)){
/*
As you are removing nodes from the DOM you should
iterate backwards through the collection.
*/
for ($i = $col->length; --$i >= 0;) {
$a = $col->item($i);
$a->parentNode->removeChild($a);
}
/* do something with processed html */
echo "<textarea cols=150 rows=100>",$dom->saveHTML(),"</textarea>";
}
使用'DOMDocument'&'DOMXPath'比正則表達式更容易 – RamRaider
試圖稍微打開該解決方案 –