首先,我的數據庫使用Windows-1250作爲原生字符集。我輸出的數據爲UTF-8。我在我的網站上使用iconv()函數將Windows-1250字符串轉換爲UTF-8字符串,並且它非常完美。PHP DOM UTF-8問題
問題是,當我使用PHP DOM來解析存儲在數據庫中的HTML(HTML是WYSIWYG編輯器的輸出並且無效,它沒有html,頭部,主體標籤等)。
的HTML可能看起來像這一點,例如:
<p>Hello</p>
下面是我用從數據庫解析某些HTML的方法:
private function ParseSlideContent($slideContent)
{
var_dump(iconv('Windows-1250', 'UTF-8', $slideContent)); // this outputs the HTML ok with all special characters
$doc = new DOMDocument('1.0', 'UTF-8');
// hack to preserve UTF-8 characters
$html = iconv('Windows-1250', 'UTF-8', $slideContent);
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
$doc->preserveWhiteSpace = false;
foreach($doc->getElementsByTagName('img') as $t) {
$path = trim($t->getAttribute('src'));
$t->setAttribute('src', '/clientarea/utils/locate-image?path=' . urlencode($path));
}
foreach ($doc->getElementsByTagName('object') as $o) {
foreach ($o->getElementsByTagName('param') as $p) {
$path = trim($p->getAttribute('value'));
$p->setAttribute('value', '/clientarea/utils/locate-flash?path=' . urlencode($path));
}
}
foreach ($doc->getElementsByTagName('embed') as $e) {
if (true === $e->hasAttribute('pluginspage')) {
$path = trim($e->getAttribute('src'));
$e->setAttribute('src', '/clientarea/utils/locate-flash?path=' . urlencode($path));
} else {
$path = end(explode('data/media/video/', trim($e->getAttribute('src'))));
$path = 'data/media/video/' . $path;
$path = '/clientarea/utils/locate-video?path=' . urlencode($path);
$width = $e->getAttribute('width') . 'px';
$height = $e->getAttribute('height') . 'px';
$a = $doc->createElement('a', '');
$a->setAttribute('href', $path);
$a->setAttribute('style', "display:block;width:$width;height:$height;");
$a->setAttribute('class', 'player');
$e->parentNode->replaceChild($a, $e);
$this->slideContainsVideo = true;
}
}
$html = trim($doc->saveHTML());
$html = explode('<body>', $html);
$html = explode('</body>', $html[1]);
return $html[0];
}
從上述方法的輸出是一個垃圾,所有的特殊字符都替換爲怪異的東西,像ÄÄÄÄÄ。
還有一件事。它確實在我的開發服務器上工作。
雖然它在生產服務器上不起作用。
有什麼建議嗎?
生產服務器的PHP版本:PHP版本5.2.0RC4-dev的
PHP開發服務器版本:PHP 5.2.13版本
UPDATE:
我自己研究解決方案。我從這個PHP錯誤報告中得到了靈感(不是真的是個bug):http://bugs.php.net/bug.php?id=32547
這是我提出的解決方案。我會嘗試明天,讓你知道,如果它的工作原理:
private function ParseSlideContent($slideContent)
{
var_dump(iconv('Windows-1250', 'UTF-8', $slideContent)); // this outputs the HTML ok with all special characters
$doc = new DOMDocument('1.0', 'UTF-8');
// hack to preserve UTF-8 characters
$html = iconv('Windows-1250', 'UTF-8', $slideContent);
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
$doc->preserveWhiteSpace = false;
// this might work
// it basically just adds head and meta tags to the document
$html = $doc->getElementsByTagName('html')->item(0);
$head = $doc->createElement('head', '');
$meta = $doc->createElement('meta', '');
$meta->setAttribute('http-equiv', 'Content-Type');
$meta->setAttribute('content', 'text/html; charset=utf-8');
$head->appendChild($meta);
$body = $doc->getElementsByTagName('body')->item(0);
$html->removeChild($body);
$html->appendChild($head);
$html->appendChild($body);
foreach($doc->getElementsByTagName('img') as $t) {
$path = trim($t->getAttribute('src'));
$t->setAttribute('src', '/clientarea/utils/locate-image?path=' . urlencode($path));
}
foreach ($doc->getElementsByTagName('object') as $o) {
foreach ($o->getElementsByTagName('param') as $p) {
$path = trim($p->getAttribute('value'));
$p->setAttribute('value', '/clientarea/utils/locate-flash?path=' . urlencode($path));
}
}
foreach ($doc->getElementsByTagName('embed') as $e) {
if (true === $e->hasAttribute('pluginspage')) {
$path = trim($e->getAttribute('src'));
$e->setAttribute('src', '/clientarea/utils/locate-flash?path=' . urlencode($path));
} else {
$path = end(explode('data/media/video/', trim($e->getAttribute('src'))));
$path = 'data/media/video/' . $path;
$path = '/clientarea/utils/locate-video?path=' . urlencode($path);
$width = $e->getAttribute('width') . 'px';
$height = $e->getAttribute('height') . 'px';
$a = $doc->createElement('a', '');
$a->setAttribute('href', $path);
$a->setAttribute('style', "display:block;width:$width;height:$height;");
$a->setAttribute('class', 'player');
$e->parentNode->replaceChild($a, $e);
$this->slideContainsVideo = true;
}
}
$html = trim($doc->saveHTML());
$html = explode('<body>', $html);
$html = explode('</body>', $html[1]);
return $html[0];
}
您是否確定要發送適當的Content-type標頭?即如果您在Firefox中打開該頁面,請檢查View-> Charset Encoding是否設置爲UTF8。 – 2010-08-23 15:17:03
你有沒有試過保存方法:$ doc-> save(); – 2010-08-23 15:54:28
@Cem我會試試看。等幾分鐘。 – 2010-08-23 16:33:40