2013-03-02 75 views
0

我會嘗試儘可能簡單地呈現它: 我使用json_encode()來編碼來自不同語言的多個utf-8字符串,並且我注意到當它們屬於ASCII表格,但其他所有內容都以'\ unnnn'返回,其中'nnnn'是一個十六進制數字。php的json_encode和字符表示

看到代碼:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" /> 
<title>Multibyte string functions</title> 
</head> 
<body> 
<h3>Multibyte string functions</h3> 
<p> 
<?php 
//present json encode errors nicely: 
//assign integer values to keys and error names to values 
echo '<br /><b>Define JSON errors</b><br />'; 
$constants = get_defined_constants(true); 
$json_errors = array(); 
foreach ($constants["json"] as $name => $value) { 
    if (!strncmp($name, "JSON_ERROR_", 11)) { 
     $json_errors[$value] = $name; 
    } 
} 
echo nl2br(print_r($json_errors, true), true); 

//Display current detection order 
echo "<br /><b>Current detection order 'mb_detect_order()':</b> ", implode(", ", mb_detect_order()); 
//Display internal encoding 
echo "<br /><b>Internal encoding 'mb_internal_encoding()':</b> ", mb_internal_encoding(); 
//Get current language 
echo "<br /><b>Current detection language 'mb_language()' ('neutral' for utf8):</b> ", mb_language(); 

//our test data 
//a nowdoc that can break a <input> field; 
$str = <<<'STR' 
O'Reilly(\n) "& 'Big\Two @ <span>bo\tld</span>" 
STR; 
$strings = array(
    $str, 
    "Latin: tell me the answer and I might find the question!", 
    "Greek: πες μου την ερώτηση και ίσως βρω την απάντηση!", 
    "Chinese simplified: 告訴我答覆,並且我也許發現問題!", 
    "Arabic: أخبرني الاجابة, انا قد تجد مسالة!", 
    "Portuguese: mais coisas a pensar sobre diário ou dois!", 
    "French: plus de choses à penser à journalier ou à deux!", 
    "Spanish: ¡más cosas a pensar en diario o dos!", 
    "Italian: più cose da pensare circa giornaliere o due!", 
    "Danish: flere ting å tenke på hver dag eller to!", 
    "Chech: Další věcí, přemýšlet o každý den nebo dva!", 
    "German: mehr über Spaß spät schönen", 
    "Albanian: më vonë gjatë fun bukur", 
    "Hungarian: több mint szórakozás késő csodálatos kenyér" 
); 

//show encoding and then encode 
foreach($strings as $string){ 
    echo "<br /><br />$string :", mb_detect_encoding($string); 
    $json = json_encode($string); 
    echo "<br />Error? ", $json_errors[json_last_error()]; 
    echo '<br />json=', $json; 
} 

上面的代碼將輸出:

Define JSON errors 
Array 
(
[0] => JSON_ERROR_NONE 
[1] => JSON_ERROR_DEPTH 
[2] => JSON_ERROR_STATE_MISMATCH 
[3] => JSON_ERROR_CTRL_CHAR 
[4] => JSON_ERROR_SYNTAX 
[5] => JSON_ERROR_UTF8 
) 

Current detection order 'mb_detect_order()': ASCII, UTF-8 
Internal encoding 'mb_internal_encoding()': ISO-8859-1 
Current detection language 'mb_language()' ('neutral' for utf8): neutral 

O'Reilly(\n) "& 'Big\Two @ bo\tld" :ASCII 
Error? JSON_ERROR_NONE 
json="O'Reilly(\\n) \"& 'Big\\Two @ bo\\tld<\/span>\"" 

Latin: tell me the answer and I might find the question! :ASCII 
Error? JSON_ERROR_NONE 
json="Latin: tell me the answer and I might find the question!" 

Greek: πες μου την ερώτηση και ίσως βρω την απάντηση! :UTF-8 
Error? JSON_ERROR_NONE 
json="Greek: \u03c0\u03b5\u03c2 \u03bc\u03bf\u03c5 \u03c4\u03b7\u03bd \u03b5\u03c1\u03ce\u03c4\u03b7\u03c3\u03b7 \u03ba\u03b1\u03b9 \u03af\u03c3\u03c9\u03c2 \u03b2\u03c1\u03c9 \u03c4\u03b7\u03bd \u03b1\u03c0\u03ac\u03bd\u03c4\u03b7\u03c3\u03b7!" 

Chinese simplified: 告訴我答覆,並且我也許發現問題! :UTF-8 
Error? JSON_ERROR_NONE 
json="Chinese simplified: \u544a\u8bc9\u6211\u7b54\u590d\uff0c\u5e76\u4e14\u6211\u4e5f\u8bb8\u53d1\u73b0\u95ee\u9898!" 

Arabic: أخبرني الاجابة, انا قد تجد مسالة! :UTF-8 
Error? JSON_ERROR_NONE 
json="Arabic: \u0623\u062e\u0628\u0631\u0646\u064a \u0627\u0644\u0627\u062c\u0627\u0628\u0629, \u0627\u0646\u0627 \u0642\u062f \u062a\u062c\u062f \u0645\u0633\u0627\u0644\u0629!" 

Portuguese: mais coisas a pensar sobre diário ou dois! :UTF-8 
Error? JSON_ERROR_NONE 
json="Portuguese: mais coisas a pensar sobre di\u00e1rio ou dois!" 

French: plus de choses à penser à journalier ou à deux! :UTF-8 
Error? JSON_ERROR_NONE 
json="French: plus de choses \u00e0 penser \u00e0 journalier ou \u00e0 deux!" 

Spanish: ¡más cosas a pensar en diario o dos! :UTF-8 
Error? JSON_ERROR_NONE 
json="Spanish: \u00a1m\u00e1s cosas a pensar en diario o dos!" 

Italian: più cose da pensare circa giornaliere o due! :UTF-8 
Error? JSON_ERROR_NONE 
json="Italian: pi\u00f9 cose da pensare circa giornaliere o due!" 

Danish: flere ting å tenke på hver dag eller to! :UTF-8 
Error? JSON_ERROR_NONE 
json="Danish: flere ting \u00e5 tenke p\u00e5 hver dag eller to!" 

Chech: Další věcí, přemýšlet o každý den nebo dva! :UTF-8 
Error? JSON_ERROR_NONE 
json="Chech: Dal\u0161\u00ed v\u011bc\u00ed, p\u0159em\u00fd\u0161let o ka\u017ed\u00fd den nebo dva!" 

German: mehr über Spaß spät schönen :UTF-8 
Error? JSON_ERROR_NONE 
json="German: mehr \u00fcber Spa\u00df sp\u00e4t sch\u00f6nen" 

Albanian: më vonë gjatë fun bukur :UTF-8 
Error? JSON_ERROR_NONE 
json="Albanian: m\u00eb von\u00eb gjat\u00eb fun bukur" 

Hungarian: több mint szórakozás késő csodálatos kenyér :UTF-8 
Error? JSON_ERROR_NONE 
json="Hungarian: t\u00f6bb mint sz\u00f3rakoz\u00e1s k\u00e9s\u0151 csod\u00e1latos keny\u00e9r" 

正如你所看到的大多數語言,除了英語,還有UTF-8字符的十六進制轉換。 是否有可能通過不替換我的Unicode字符進行編碼?它安全嗎?其他人做什麼?

你應該考慮來自用戶在頁面中輸入並存儲到mysql的編碼。

謝謝。

+0

爲什麼這會打擾你?這是有效的JSON,甚至假裝成JSON的一切應該能夠處理它沒有任何問題?除非你使用json來做某件事,否則json不適用於......就像逃避查詢一樣。 – Wrikken 2013-03-05 17:59:53

+0

什麼事情?逃逸版本是否會導致任何問題? – TRiG 2013-03-05 18:01:17

回答

1

好吧, 真的非常感謝您的回答!

問題是我在版本PHP Version 5.3.10json_encode($string, JSON_UNESCAPED_UNICODE)不是一個選項。

幸運的是,一個名爲「Swordsteel先生」的傢伙張貼在PHP手冊http://www.php.net/manual/en/function.json-encode.php實際上做的伎倆評論(謝謝Swordsteel先生!) 真正的矛盾在於,它完全模擬json_encode功能,並給出如果我們想要一個提示將其移植到另一種語言,如JavaScript,並保持我們的圖書館交流。

function my_json_encode($in){ 
    $_escape = function ($str) { 
     return addcslashes($str, "\v\t\n\r\f\"\\/"); 
    }; 
    $out = ""; 
    if (is_object($in)){ 
     $class_vars = get_object_vars(($in)); 
     $arr = array(); 
     foreach ($class_vars as $key => $val){ 
     $arr[$key] = "\"{$_escape($key)}\":\"{$val}\""; 
     } 
     $val = implode(',', $arr); 
     $out .= "{{$val}}"; 
    }elseif (is_array($in)){ 
     $obj = false; 
     $arr = array(); 
     foreach($in as $key => $val){ 
     if(!is_numeric($key)){ 
      $obj = true; 
     } 
     $arr[$key] = my_json_encode($val); 
     } 
     if($obj){ 
     foreach($arr AS $key => $val){ 
      $arr[$key] = "\"{$_escape($key)}\":{$val}"; 
     } 
     $val = implode(',', $arr); 
     $out .= "{{$val}}"; 
     }else { 
     $val = implode(',', $arr); 
     $out .= "[{$val}]"; 
     } 
    }elseif (is_bool($in)){ 
     $out .= $in ? 'true' : 'false'; 
    }elseif (is_null($in)){ 
     $out .= 'null'; 
    }elseif (is_string($in)){ 
     $out .= "\"{$_escape($in)}\"";debug('in='.$in.', $_escape($in)='.$_escape($in).', out='.$out); 
     }else{ 
     $out .= $in; 
     } 
     return "{$out}"; 
    } 

我給了它很多的測試,並不能打破它! 現在重新實施json_decode會非常有趣!

謝謝。