2010-10-16 70 views
0

我正在解析一個巨大的xml文件,並且要對文件進行編碼
<? XML版本= 「1.0」 編碼= 「ISO-8859-1」?> **大膽XMLReader - 用utf字符獲取問題

的數據庫編碼爲UTF8之前保存任何東西DB
$ SQL我運行此查詢='集名稱「utf8」COLLATE「utf8_swedish_ci」';

的問題是什麼,有時一些非標準的字符來在xml文件像
Lycka™:羅馬
我知道,商標符號是從Windows-1252編碼。

使用php的進出口。我試過utf8_encode。

這裏保存在數據庫alt text

這裏是在瀏覽器中輸出alt text

我希望它轉換爲UTF,這就是它

回答

0

我用這個代碼,並使用PHP的罰款

function cp1252_to_utf8($str) 
{ 

     $cp1252_map = array(
       "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */ 
       "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */ 
       "\xc2\x83" => "\xc6\x92",  /* LATIN SMALL LETTER F WITH HOOK */ 
       "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */ 
       "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */ 
       "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */ 
       "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */ 
       "\xc2\x88" => "\xcb\x86",  /* MODIFIER LETTER CIRCUMFLEX ACCENT */ 
       "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */ 
       "\xc2\x8a" => "\xc5\xa0",  /* LATIN CAPITAL LETTER S WITH CARON */ 
       "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */ 
       "\xc2\x8c" => "\xc5\x92",  /* LATIN CAPITAL LIGATURE OE */ 
       "\xc2\x8e" => "\xc5\xbd",  /* LATIN CAPITAL LETTER Z WITH CARON */ 
       "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */ 
       "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */ 
       "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */ 
       "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */ 
       "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */ 
       "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */ 
       "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */ 

       "\xc2\x98" => "\xcb\x9c",  /* SMALL TILDE */ 
       "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */ 
       "\xc2\x9a" => "\xc5\xa1",  /* LATIN SMALL LETTER S WITH CARON */ 
       "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/ 
       "\xc2\x9c" => "\xc5\x93",  /* LATIN SMALL LIGATURE OE */ 
       "\xc2\x9e" => "\xc5\xbe",  /* LATIN SMALL LETTER Z WITH CARON */ 
       "\xc2\x9f" => "\xc5\xb8"  /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/ 
     ); 

     return strtr(utf8_encode($str), $cp1252_map); 
} 


$str = cp1252_to_utf8(iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str)); 
0

你嘗試編碼字符串UTF8保存前到db? 對於php有utf8_encode()函數,可能在您使用的語言中有類似的功能。

+0

林。雅,我試過utf8_encode。這裏是什麼保存在分貝「LyckaÂ:羅馬」,當我嘗試解碼它,它顯示爲「Lycka:羅馬」 – 2010-10-16 12:48:04

+0

我已更新說明,請檢查 – 2010-10-16 12:52:53

+0

我認爲你將不得不使用多字節功能進行編碼。使用mb_convert_encoding()http://php.net/manual/en/function.mb-convert-encoding.php – Nands 2010-10-17 10:50:21