我看到不同的行爲解碼Python 3.4.3上的字節字符串跨兩個盒子 - 一個運行OS X和另一個Debian Wheezy。什麼可以導致不同的python 3.4 bytes.decode()跨不同的安裝行爲
在OS X:
$ python
Python 3.4.3 (default, Mar 10 2015, 14:53:35)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = b'\xc4\x8dtrn\xc3\xa1ct'
>>> print(s.decode("utf-8"))
čtrnáct
在Debian:
$ python
Python 3.4.3 (default, Apr 4 2015, 22:21:17)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = b'\xc4\x8dtrn\xc3\xa1ct'
>>> print(s.decode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u010d' in position 0: ordinal not in range(128)
必須有東西在這兩個安裝這是造成這個配置略有不同。我檢查了兩者的默認編碼,結果是相同的,但我不確定我可以檢查什麼。
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
更新:現場返回兩者之間的差異:
OS X:
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
Debian的:
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
什麼'locale'或'echo $ LANG' bash輸出? – 2015-04-04 22:48:59
'echo $ LANG'在OSX上不返回任何內容,而在Debian上則爲「en_US.UTF-8」。我已經將'locale'結果添加爲編輯 – Sean 2015-04-04 22:59:06
嘗試設置'LC_ALL =「en_US.utf8」'如果這樣做'sudo locale -gen en_US.UTF-8' – 2015-04-04 23:11:00