0
我想選擇一個HTML頁面的所有網址到像數組:PHP:DOM獲取URL和錨(但不是IMG)
This is a webpage <a href="http://somesite.com/link1.php">with</a>
different kinds of <a href="http://somesite.com/link1.php"><img src="someimg.png"></a>
輸出我想是:
with => http://somesite.se/link1.php
現在我得到:
<img src="someimg.png"> => http://somesite.com/link1.php
with => http://somesite.com/link1.php
我不想讓網址/,它包含的起點和終點之間的圖像鏈接。只有文字的。
我當前的代碼是:
<?php
function innerHTML($node) {
$ret = '';
foreach ($node->childNodes as $node) {
$ret .= $node->ownerDocument->saveHTML($node);
}
return $ret;
}
$html = file_get_contents('http://somesite.com/'.$_GET['apt']);
$dom = new DOMDocument;
@$dom->loadHTML($html); // @ = Removes errors from the HTML...
$links = $dom->getElementsByTagName('a');
$result = array();
foreach ($links as $link) {
//$node = $link->nodeValue;
$node = innerHTML($link);
$href = $link->getAttribute('href');
if (preg_match('/\.pdf$/i', $href))
$result[$node] = $href;
}
print_r($result);
?>
完美!謝謝! :) –