2013-09-22 55 views
0

我想解析任何url的內容。哪個不應該包含任何html代碼。 這工作正常,但在閱讀給定的網址上的內容時出現一堆錯誤。如何刪除此警告?解析頁面內容時刪除DocDocument警告

<?php 
$url= 'http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page'; 
$doc = new DOMDocument(); 
$doc->loadHTMLFile($url); 
$xpath = new DOMXPath($doc); 
foreach($xpath->query("//script") as $script) { 
    $script->parentNode->removeChild($script); 
} 
$textContent = $doc->textContent; //inherited from DOMNode 
echo $textContent; 
?> 

警告:

content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 
+0

[DOMDocument :: loadHTML error]可能的重複(http://stackoverflow.com/questions/9149180/domdocumentloadhtml-error) – hakre

回答

2

您可以使用libxml_use_internal_errors()並執行以下操作:

libxml_use_internal_errors(true); 
$doc->loadHTMLFile($url); 
libxml_clear_errors(); 

正如Peehaa在下面的評論指出,這是重置錯誤的狀態是個好主意。你可以如下做到這一點:

$errors = libxml_use_internal_errors(true); //store 
$doc->loadHTMLFile($url); 
libxml_clear_errors(); 
libxml_use_internal_errors($errors); //reset back to previous state 

下面是它如何工作的:

Demo!

+0

請注意,存儲'libxml_use_intern的當前狀態al_errors'並在之後重置。 – PeeHaa

+1

@PeeHaa:好主意。我已經添加到答案:) –

+0

@AmalMurali:非常感謝。你能解釋我的代碼差異嗎? – user123