2013-10-07 98 views
0

假設$html_dom包含一個具有HTML實體的頁面,如 。在下面的輸出中,我得到這樣的輸出 PHP的HTML DOM,XPATH - 奇怪的字符?

$html_dom = new DOMDocument(); 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query_abstract); 
foreach ($my_foos as $my_foo) 
{ 
    echo html_entity_decode($my_foos->nodeValue); 
    die; 
} 

我如何妥善處理這讓我沒有得到怪異字符?我嘗試沒有成功如下:

$html_doc = mb_convert_encoding($html_doc, 'HTML-ENTITIES', 'UTF-8'); 
$html_dom = new DOMDocument(); 
$html_dom->resolveExternals = TRUE; 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query); 
foreach ($my_foos as $my_foo) 
{ 
    echo html_entity_decode($my_foos->nodeValue); 
    die; 
} 

回答

1

mb_convert_encoding是個好主意,但預期它不工作,因爲DOMDocument似乎有點大馬車,當涉及到編碼。

mb_convert_encoding移動到實際的節點輸出做了訣竅。

$html_dom = new DOMDocument(); 
$html_dom->resolveExternals = TRUE; 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query); 
foreach ($my_foos as $my_foo) 
{ 
    echo mb_convert_encoding($my_foo->nodeValue, 'HTML-ENTITIES', 'UTF-8'); 
    die; 
} 
+0

證實它有效。謝謝。 – StackOverflowNewbie