php的錯誤simple_html_dom解析器

我在使用simple_html_dom類時發現錯誤。

我的html字符串必須解析就像這樣。

<!DOCTYPE html> 
<html lang="en"> 
<head> 
<title>Y-shaped ZnO Nanobelts Driven from Twinned</title> 

<meta name="site" content="Reports"/> 

<meta name="description" content="Description with twinned planes {11&#"/> 

<meta name="image" content="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"/> 


... 


</body> 
</html>

我試圖用find（「meta [name = image]」）獲得名爲image的meta標籤，但是我不能。

我檢查了原因，發現它是因爲上面這行中間的字符'&＃'。

<meta name="description" content="Description with twinned planes {11&#"/>

我得到了元標記一樣，

Description with twinned planes {11&#"/> <meta name="image" ....

因此，在這種情況下，內容屬性，我應該讓simple_html_dom正確解析HTML嗎？

否則是否有任何其他庫來正確解析此html？

來源

2017-07-28 Cuza

是不是一個問題是，{11＆＃應該是{11 &＃ –

試試這個代碼：使用PHP DomDocument

你可以得到元使用getElementsByTagName並獲得使用屬性值getAttribute

$hml = '<!DOCTYPE html> 
<html lang="en"> 
<head> 
<title>Y-shaped ZnO Nanobelts Driven from Twinned</title> 

<meta name="site" content="Reports"/> 

<meta name="description" content="Description with twinned planes {11&#"/> 

<meta name="image" content="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"/> 
</head> 
<body> 

</body> 
</html>'; 

$dom = new DOMDocument(); 
libxml_use_internal_errors(true); 

$dom->loadHTML($hml); 

$metas = $dom->getElementsByTagName('meta'); 

foreach($metas as $meta){ 

if($meta->getAttribute('name')=="image"){echo $meta->getAttribute('content');} 

}

輸出：

https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a

注意：如果你是從頁面加載內容使用 $dom->loadHTMLFile("your_pagename.html");，而不是這個 $dom->loadHTML($hml);

來源

2017-07-28 12:06:34 NID

php的錯誤simple_html_dom解析器

回答

相關問題