2013-09-25 57 views
0

我試圖使用提取從HTML頁面鏈接DOM:PHP的DOMDocument - 匹配和刪除網址

$html = file_get_contents('links.html'); 
$DOM = new DOMDocument(); 
$DOM->loadHTML($html); 
$a = $DOM->getElementsByTagName('a'); 
foreach($a as $link){ 
    //echo out the href attribute of the <A> tag. 
    echo $link->getAttribute('href').'<br/>'; 
} 

輸出:

http://dontwantthisdomain.com/dont-want-this-domain-name/ 
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/ 
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/ 
http://domain1.com/page-X-on-domain-com.html 

http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html 
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/ 
http://domain.com/page-XZ-on-domain-com.html 

http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/ 
http://dontwantthisdomain2.com/same-as-above/ 
http://domain3.com/page-XYZ-on-domain3-com.html 

我想刪除匹配所有結果dontwantthisdomain.com ,dontwantthisdomain2.com和dontwantthisdomain3.com所以輸出將看起來像這樣:

http://domain1.com/page-X-on-domain-com.html 
http://domain.com/page-XZ-on-domain-com.html 
http://domain3.com/page-XYZ-on-domain3-com.html 

任何想法? :)

+0

'$ x = new DOMXPath($ DOM); $ x-> query('// a/@ href/[not(contains(text(),「dontwantthisdomain」))]);':P – kojiro

+0

@ yann-milin你可以看看,讓我知道你認爲?謝謝pal – Kris

+0

@kojiro:它接縫,你的代碼導致錯誤。你可以很難過嗎?謝謝:) – Kris

回答

0

我認爲你應該使用正則表達式.Google它和樂趣

+0

用$ html = preg_replace('# Kris