使用DOMDocument從網站中抓取所有圖像

我基本上想獲得所有所有使用DOMDocument的網站中的圖像。但後來我甚至不能加載我的HTML由於我不知道的一些原因。使用DOMDocument從網站中抓取所有圖像

$url="http://<any_url_here>/"; 
$dom = new DOMDocument(); 
@$dom->loadHTML($url); //i have also tried removing @ 
$dom->preserveWhiteSpace = false; 
$dom->saveHTML(); 
$images = $dom->getElementsByTagName('img'); 
foreach ($images as $image) 
{ 
echo $image->getAttribute('src'); 
}

發生的事情是沒有打印。或者我在代碼中做了什麼錯誤？

來源

2013-04-09 Leonid

你沒有得到錯誤信息的原因可能是這行'@ $ dom-> loadHTML（$ url）;'在php中'@'隱藏了該函數的所有錯誤信息。 – 2013-04-09 07:32:10

我在幾年前刪除它，但仍然沒有結果... – Leonid 2013-04-09 07:34:07

您不會得到結果，因爲'$ dom-> loadHTML（）'需要html。你給它一個url，你首先需要得到你想要解析的頁面的html。你可以使用'file_get_contents（）'。（查看答案） – 2013-04-09 07:36:13

You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.

我在我的圖像抓取類中使用了這個。對我來說工作得很好。

$html = file_get_contents('http://www.google.com/'); 
$dom = new domDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$images = $dom->getElementsByTagName('img'); 
foreach ($images as $image) { 
    echo $image->getAttribute('src'); 
}

來源

2013-04-09 07:29:36

我現在有一個實體錯誤中重新定義的Attribute類。 '$ dom = new DOMDocument; \t \t $ htmls = file_get_contents（「http://philcooke.com/inspiration-happens-but-the-best-ideas-take-time/」）; $ dom-> loadHTML（$ htmls）;' – Leonid 2013-04-09 08:34:30

你的回答幾乎是正確的。只需在$ dom-> loadHTML（$ html）前添加一個「@」字符' – Leonid 2013-04-09 08:40:17

在'$ dom-> loadHTML（$ html）'之前追加'@'來壓制錯誤，您可以使用tidy先清理html。 ''tidy = tidy_parse_string（$ html）; $ html = $ tidy-> html（） - > value;'''但也許這太多了。 – 2013-11-28 08:09:01

使用DOMDocument從網站中抓取所有圖像

回答

相關問題