公然mojibake箱子一個hexdump -C
。我曾寫過一個小的.bat
腳本,它顯示了(最知名的)OEM和ANSI代碼頁到Unicode表的映射,反之亦然。下面是0x85
代碼特定結果:
==> alts.bat 0x85
CP/ACP Hex Codepoint #Description :show8bit 133 <--> 0x85)
------ --- --------- ------------------------
CP1250 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1251 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1252 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1253 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1254 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1255 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1256 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1257 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1258 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP437 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP737 0x85 0x0396 #GREEK CAPITAL LETTER ZETA
CP775 0x85 0x#LATIN SMALL LETTER G WITH CEDILLA
CP850 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP852 0x85 0x016f #LATIN SMALL LETTER U WITH RING ABOVE
CP855 0x85 0x0401 #CYRILLIC CAPITAL LETTER IO
CP857 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP860 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP861 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP862 0x85 0x05d5 #HEBREW LETTER VAV
CP863 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP864 0x85 0x2500 #FORMS LIGHT HORIZONTAL
CP865 0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
CP866 0x85 0x0415 #CYRILLIC CAPITAL LETTER IE
CP869 0x85 #UNDEFINED
CP874 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP932 0x85 #DBCS LEAD BYTE
CP936 0x85 #DBCS LEAD BYTE
CP949 0x85 #DBCS LEAD BYTE
CP950 0x85 #DBCS LEAD BYTE
==>
反之亦然爲0x2026
碼點(抱歉壞輸出列中的非窗口CP線情況下移位):
==> alts.bat 0x2026
CP/ACP Hex Codepoint #Description :show16bit 8230 <--> 0x2026
------ --- --------- -------------------------
CP1250 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1251 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1252 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1253 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1254 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1255 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1256 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1257 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP1258 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP874 0x85 0x2026 #HORIZONTAL ELLIPSIS
CP932 0x8163 0x2026 #HORIZONTAL ELLIPSIS
CP936 0xA1AD 0x2026 #HORIZONTAL ELLIPSIS
CP949 0xA1A6 0x2026 #HORIZONTAL ELLIPSIS
CP950 0xA14B 0x2026 #HORIZONTAL ELLIPSIS
macCYRILLIC_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
macGREEK_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
macICELAND_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
macLATIN2_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
macROMAN_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
macTURKISH_CP 0xC9 0x2026 #HORIZONTAL ELLIPSIS
==>
進一步閱讀:Encodings and Code Pages
這種情況發生的唯一方法是瀏覽器忽略頁面報告的字符集並使用不同的字符集,例如用戶指定的覆蓋。但我不知道哪個字符集會將0x85解釋爲U + 016F。沒有一個CP-12xx/Windows-12xx字符集,0x85是U + 2026 HORIZONTAL ELLIPSIS。 ISO-8859-x甚至都不支持0x85。 –
我發現一個字符集,將0x85解釋爲U + 016F:[CP852](DOS拉丁語-2),不要與[ ISO-8859-2](https://en.m.wikipedia.org/wiki/ISO/IEC_8859-2)(ISO Latin-2)。 –
感謝@RemyLebeau,看起來很奇怪的是,一個正常配置的瀏覽器正在將一些文本視爲DOS Latin-2,但至少這比我提出的「魔術」更有意義。我會做更多的測試,看看我是否可以複製不同的角色。 –