注意區別:
↓
Glacie%CC%80re_Service-de-lEducation-Ambassade-Chine_map.png
Glaci%C3%A8re_Service-de-lEducation-Ambassade-Chine_map.png
閱讀Normalization Forms在Unicode® Standard Annex #15: UNICODE NORMALIZATION FORMS。
不幸的是,我不會說PHP;然而,下面的蟒示例可以幫助:
import unicodedata,urllib
from urllib import parse
x = unicodedata.lookup('Latin Small Letter E With Grave')
print(x, len(x))
y = unicodedata.normalize('NFKD', x)
print(y, len(y))
for char in (x + ' ' + y):
print(char, urllib.parse.quote(char, safe='/'),unicodedata.name(char, '?'))
結果:
==> python
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata,urllib
>>> from urllib import parse
>>>
>>> x = unicodedata.lookup('Latin Small Letter E With Grave')
>>> print(x, len(x))
è 1
>>>
>>> y = unicodedata.normalize('NFKD', x)
>>> print(y, len(y))
è 2
>>>
>>> for char in (x + ' ' + y):
... print(char, urllib.parse.quote(char, safe='/'),unicodedata.name(char, '?'))
...
è %C3%A8 LATIN SMALL LETTER E WITH GRAVE
%20 SPACE
e e LATIN SMALL LETTER E
̀ %CC%80 COMBINING GRAVE ACCENT
>>>
>>>
結果截圖加入作爲我不能防止NFKC
正常化上述代碼示例中e` 2
串的,看到結果print(y, len(y))
: