2014-01-21 61 views
0

我已經將博客帳戶中的內容導入到Wordpress博客中。爲什麼Doctype打印在我的頁面上?

我不得不申請一些XPath和正則表達式來刪除一些討厭的格式。

global $post; 
$html = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8"); 
$doc = new DOMDocument();@$doc - > loadHTML($html); 
$xpath = new DOMXPath($doc); 
foreach($xpath - > query('//br[not(preceding::text())]') as $node) { 
    $node - > parentNode - > removeChild($node); 
} 
$nodes = $xpath - > query('//a[string-length(.) = 0]'); 
foreach($nodes as $node) { 
    $node - > parentNode - > removeChild($node); 
} 
$nodes = $xpath - > query('//*[not(text() or node() or self::br)]'); 
foreach($nodes as $node) { 
    $node - > parentNode - > removeChild($node); 
} 
remove_filter('the_content', 'wpautop'); 
$content = $doc - > saveHTML(); 
$content = ltrim($content, '<br>'); 
$content = strip_tags($content, '<br> <a> <iframe>'); 
$content = preg_replace(array('/(<br\s*\/?>\s*){1,}/'), array('<br/><br/>'), $content); 
$content = str_replace('&nbsp;', ' ', $content); 
$content = "<p>".implode("</p>\n\n<p>", preg_split('/\n(?:\s*\n)+/', $content))."</p>"; 
return $content; 

由於某種原因,雖然隨機DOCTYPE打印在我的頁面內,我不知道爲什麼。

<p>!DOCTYPE html PUBLIC &#8220;-//W3C//DTD HTML 4.0 Transitional//EN&#8221; &#8220;http://www.w3.org/TR/REC-html40/loose.dtd&#8221;> 
    <br/> 
    <br/>When the battle is on between contestants in a talent show, it gets really competitive when down to the last four. X-FactorUSAcontestant Marcus Canty knows this all too well as this is the stage he was voted off of the show earlier this year. 
    <br/> 
    <br/> 
</p> 

有人能指出我爲什麼會發生這種情況嗎?

回答

2

當您使用DOMDocument加載一段HTML代碼時,Doctype,html,head和body標籤會自動添加(如果缺少)到這片html(並且未封閉的標籤已關閉)以使其成爲「有效「html文件。所以,當你使用saveHTML你保存所有這一切。如果我沒記錯,你可以在PHP手冊中找到幾個技巧來避免這種情況(在帖子中)

+0

啊我看到了,所以我需要找到一種方法來阻止DOMDocument在使用saveHTML時應用其DOCTYPE? – UzumakiDev

+1

@UzumakiDev:不,你不能看到PHP手冊(或者stackoverflow)來找到一個只保存代碼片段的技巧。 –

+0

@UzumakiDev:看看這裏:http://stackoverflow.com/questions/6851620/how-to-prevent-the-doctype-from-being-added-to-the-html –

相關問題