MBSTRING的支持已啓用(但。沒有加載)在默認情況下爲PHP 5.4.0的加載擴展,這可以讓你做的事:
<? //PHP 5.4+
$ensureIsUTF8 = static function($data){
$dataEncoding = \mb_detect_encoding(
$data,
['UTF-8', 'windows-1251', 'iso-8859-1', /*others you encounter*/],
true
);
//UTF-16/32 encoding detection always fails for PHP <= 5.4.1
//Use detection code copied from PHP docs comments:
//http://www.php.net/manual/en/function.mb-detect-encoding.php
if ($dataEncoding === false){
$UTF32_BIG_ENDIAN_BOM = chr(0x00) . chr(0x00) . chr(0xFE) . chr(0xFF);
$UTF32_LITTLE_ENDIAN_BOM = chr(0xFF) . chr(0xFE) . chr(0x00) . chr(0x00);
$UTF16_BIG_ENDIAN_BOM = chr(0xFE) . chr(0xFF);
$UTF16_LITTLE_ENDIAN_BOM = chr(0xFF) . chr(0xFE);
$first2 = \substr($data, 0, 2);
$first4 = \substr($data, 0, 4);
if ($first4 === $UTF32_BIG_ENDIAN_BOM) {
$dataEncoding = 'UTF-32BE';
} elseif ($first4 === $UTF32_LITTLE_ENDIAN_BOM) {
$dataEncoding = 'UTF-32LE';
} elseif ($first2 === $UTF16_BIG_ENDIAN_BOM) {
$dataEncoding = 'UTF-16BE';
} elseif ($first2 === $UTF16_LITTLE_ENDIAN_BOM) {
$dataEncoding = 'UTF-16LE';
} else {
throw new \Exception('Whoa! No idea what that was.');
}
}
if ($dataEncoding === 'UTF-8'){
return $data;
} else {
return \mb_convert_encoding(
$data,
'UTF-8',
$dataEncoding
);
}
};
$utf8Data = $ensureIsUTF8(\file_get_contents('something'));
$utf8Data = $ensureIsUTF8(\file_get_contents('http://somethingElse'));
$utf8Data = $ensureIsUTF8($userProvidedData);
?>
你打算什麼編碼*從* – Brad
我不知道有_is_一個更好的方式比你?因爲你不只是實際翻譯的字符編碼人物 - 實際上將諸如''的東西轉換爲'A',而不是從頭到尾使用utf-8(這意味着您根本不需要翻譯字符,並且可以非常高興地將'Â'留在資源)。 – CD001
事實上 - 我們來自任何東西 - 因爲我們的用戶可以上傳文檔,並從其他來源獲取信息......大部分情況下,我們都是通過「iso-8859-1」和「windows-1251」獲取信息的,但是它不僅僅是字母,我們還有瘋狂的middot角色等...... – zeroasterisk