0
我試圖使用提取從HTML頁面鏈接DOM:PHP的DOMDocument - 匹配和刪除網址
$html = file_get_contents('links.html');
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$a = $DOM->getElementsByTagName('a');
foreach($a as $link){
//echo out the href attribute of the <A> tag.
echo $link->getAttribute('href').'<br/>';
}
輸出:
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
我想刪除匹配所有結果dontwantthisdomain.com ,dontwantthisdomain2.com和dontwantthisdomain3.com所以輸出將看起來像這樣:
http://domain1.com/page-X-on-domain-com.html
http://domain.com/page-XZ-on-domain-com.html
http://domain3.com/page-XYZ-on-domain3-com.html
任何想法? :)
'$ x = new DOMXPath($ DOM); $ x-> query('// a/@ href/[not(contains(text(),「dontwantthisdomain」))]);':P – kojiro
@ yann-milin你可以看看,讓我知道你認爲?謝謝pal – Kris
@kojiro:它接縫,你的代碼導致錯誤。你可以很難過嗎?謝謝:) – Kris