2010-09-01 31 views
3

如果我嘗試加載HTML文檔到PHP DOM我得到的線沿線的一個錯誤:DOM錯誤 - ID「someAnchor」在實體已經定義,行X

Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9 

我不明白爲什麼。以下是一些將HTML字符串加載到DOM中的代碼。

首先不包含錨標記,第二個包含一個錨標記。第二個文檔產生一個錯誤。

希望你應該能夠將其剪切並粘貼到一個腳本並運行它看到相同的輸出:

<?php 
ini_set('display_errors', 1); 
error_reporting(E_ALL); 


$stringWithNoAnchor = <<<EOT 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>My document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
</head> 
<body > 
<h1>Hello</h1> 
</body> 
</html> 
EOT; 

$stringWithAnchor = <<<EOT 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>My document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
</head> 
<body > 
<h1>Hello</h1> 
<a name="someAnchor" id="someAnchor"></a> 
</body> 
</html> 
EOT; 

class domGrabber 
    { 
    public $_FileErrorStr = ''; 

    /** 
    *@desc DOM object factory does the work of loading the DOM object 
    */ 
    public function getLoadAsDOMObj($htmlString) 
     { 
     $this->_FileErrorStr =''; //reset error container 
     $xmlDoc = new DOMDocument(); 
     set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed 
     $xmlDoc->loadHTML($htmlString); 
     restore_error_handler(); 
     return $xmlDoc; 
     } 

    /** 
    *@desc public so that it can catch errors from outside this class 
    */ 
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline) 
     { 
     if ($this->_FileErrorStr === null) 
      { 
      $this->_FileErrorStr = $errstr; 
      } 
     else { 
      $this->_FileErrorStr .= (PHP_EOL . $errstr); 
      } 
     } 
    } 

$domGrabber = new domGrabber(); 
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor); 

echo 'PHP Version: '. phpversion() .'<br />'."\n"; 

echo '<pre>'; 
print $xmlDoc->saveXML(); 
echo '</pre>'."\n"; 
if ($domGrabber->_FileErrorStr) 
    { 
    echo 'Error'. $domGrabber->_FileErrorStr; 
    } 



$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor); 
echo '<pre>'; 
print $xmlDoc->saveXML(); 
echo '</pre>'."\n"; 
if ($domGrabber->_FileErrorStr) 
    { 
    echo 'Error'. $domGrabber->_FileErrorStr; 
    } 

我得到以下出來把我的Firefox的源代碼視圖:

PHP Version: 5.2.9<br /> 
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body> 
<h1>Hello</h1> 
</body></html> 
</pre> 
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body> 
<h1>Hello</h1> 
<a name="someAnchor" id="someAnchor"></a> 

</body></html> 
</pre> 
Error 
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9 

那麼,爲什麼DOM說someAnchor已經被定義了?


更新:

我嘗試用兩個

  • 相反,我使用的loadXML的()方法,使用loadHTML()的 - 並且固定它
  • 代替具有兩個編號的和名稱我只用id - 屬性,並修復它。

看到這裏比較腳本完成的緣故:

<?php 
ini_set('display_errors', 1); 
error_reporting(E_ALL); 


$stringWithNoAnchor = <<<EOT 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>My document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
</head> 
<body > 
<p>stringWithNoAnchor</p> 
</body> 
</html> 
EOT; 

$stringWithAnchor = <<<EOT 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>My document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
</head> 
<body > 
<p>stringWithAnchor</p> 
<a name="someAnchor" id="someAnchor" ></a> 
</body> 
</html> 
EOT; 

$stringWithAnchorButOnlyIdAtt = <<<EOT 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>My document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
</head> 
<body > 
<p>stringWithAnchorButOnlyIdAtt</p> 
<a id="someAnchor"></a> 
</body> 
</html> 
EOT; 

class domGrabber 
    { 
    public $_FileErrorStr = ''; 
    public $useHTMLMethod = TRUE; 

    /** 
    *@desc DOM object factory does the work of loading the DOM object 
    */ 
    public function loadDOMObjAndWriteOut($htmlString) 
     { 
     $this->_FileErrorStr =''; 

     $xmlDoc = new DOMDocument(); 
     set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed 


     if ($this->useHTMLMethod) 
      { 
      $xmlDoc->loadHTML($htmlString); 
      } 
     else { 
      $xmlDoc->loadXML($htmlString); 
      } 


     restore_error_handler(); 

     echo "<h1>"; 
     echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()'; 
     echo "</h1>"; 
     echo '<pre>'; 
     print $xmlDoc->saveXML(); 
     echo '</pre>'."\n"; 
     if ($this->_FileErrorStr) 
      { 
      echo 'Error'. $this->_FileErrorStr; 
      } 
     } 

    /** 
    *@desc public so that it can catch errors from outside this class 
    */ 
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline) 
     { 
     if ($this->_FileErrorStr === null) 
      { 
      $this->_FileErrorStr = $errstr; 
      } 
     else { 
      $this->_FileErrorStr .= (PHP_EOL . $errstr); 
      } 
     } 
    } 

$domGrabber = new domGrabber(); 

echo 'PHP Version: '. phpversion() .'<br />'."\n"; 

$domGrabber->useHTMLMethod = TRUE; //DOM->loadHTML 
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor); 
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor); 
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt); 

$domGrabber->useHTMLMethod = FALSE; //use DOM->loadXML 
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor); 
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor); 
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt); 
+0

更新:我發現HTML整潔是把id屬性爲我的XHTML文檔.. ..尋找到 http://tidy.sourceforge.net/docs/quickref.html#anchor-as-name 目前得到一個新的錯誤: 整潔:: parseFile()[tidy.parsefile]:未知整潔的配置選項'anchor-as-name' 但這是一個分離e(但相關)問題。只是爲了你的興趣。 – 2010-09-01 03:09:02

+0

只是另一個說明:如果你有文檔中的實體,如loadXML()不起作用,例如£或£(我認爲我可以聲明它們 - 但是這擊敗了xhtml) 因此,我發現loadHTML()必須與既沒有將id和name屬性設置爲相同值的錨一起使用。 – 2010-09-01 05:08:44

回答

6

如果您加載XML文件(是這樣的話,XHTML是XML),那麼你應該使用DOMDocument::loadXML(),不DOMDocument::loadHTML()

在HTML中,nameid都引入了一個ID。所以你重複ID「someAnchor」,因此錯誤。

但是,W3C驗證程序允許以您顯示的形式重複ID <a id="someAnchor" name="someAnchor"></a>。這可能是libmxl2的一個錯誤。

在這種bug report爲libxml2的,用戶提出了一個補丁只考慮name屬性作爲ID:

According to the HTML and XHTML specs, only the a element's name attribute shares name space with id attributes. For some of the elements it can be argued that multiple instances with the same name don't make sense, but they should nevertheless not be considered in the same namespace as other elements' id attributes.

See http://www.zvon.org/xxl/xhtmlReference/Output/Strict/attr_name.html for all the elements that take name attributes and their semantics.

+0

只需將此博客文章作爲[快速修復](http://www.eatmybusiness。COM /食品/ 2010/09/01/HTML的整潔修復換錨標記,已定義 - 在實體線/ 170 /)。 – 2012-02-07 22:01:03