使python默認使用字符串替換不可編碼的字符

我想讓python忽略不能編碼的字符，只需用字符串"<could not encode>"替換即可。使python默認使用字符串替換不可編碼的字符

E.g，假設默認編碼是ASCII，命令

'%s is the word'%'ébác'

會產生

'<could not encode>b<could not encode>c is the word'

有什麼辦法，使這個默認的行爲，在所有我的項目？

來源

2009-12-19 olamundo

如果默認編碼是ascii，那麼''ébác''字符串的編碼是什麼？ –

@Peter Hansen - 你是對的:)它只是解釋我想要的......不好的例子。 – olamundo

的str.encode函數採用限定所述錯誤處理的可選參數：

str.encode([encoding[, errors]])

從文檔：

返回字符串的編碼版本。默認編碼是當前的默認字符串編碼。可能會給出錯誤來設置不同的錯誤處理方案。錯誤的默認值是'strict'，這意味着編碼錯誤會引發UnicodeError。其他可能的值有'ignore'，'replace'，'xmlcharrefreplace'，'backslashreplace'以及通過codecs.register_error（）註冊的任何其他名稱，請參見編解碼器基類。有關可能的編碼列表，請參見標準編碼部分。

在你的情況下，codecs.register_error函數可能是感興趣的。

[備註壞字符]

順便說一句，請注意使用register_error時，你可能會發現自己與你的字符串替換不只是個別壞人的角色，但連續的壞字符組，除非你支付注意。每次運行不好的字符都會得到一個錯誤處理程序的調用，而不是每個字符。

來源

2009-12-19 15:39:20 miku

在[這個Python測試文件]（https://github.com/python-git/python/blob/master/Lib/test/test_codeccallbacks.py）中有一些如何使用'codecs.register_error'的例子。 –

>>> help("".encode) 
Help on built-in function encode: 

encode(...) 
S.encode([encoding[,errors]]) -> object 

Encodes S using the codec registered for encoding. encoding defaults 
to the default encoding. errors may be given to set a different error 
handling scheme. Default is 'strict' meaning that encoding errors raise 
a UnicodeEncodeError. **Other possible values are** 'ignore', **'replace'** and 
'xmlcharrefreplace' as well as any other name registered with 
codecs.register_error that is able to handle UnicodeEncodeErrors.

所以，舉例來說：

>>> x 
'\xc3\xa9b\xc3\xa1c is the word' 
>>> x.decode("ascii") 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) 
>>> x.decode("ascii", "replace") 
u'\ufffd\ufffdb\ufffd\ufffdc is the word'

添加您自己的回調codecs.register_error與您所選擇的字符串替換。

來源

2009-12-19 15:45:28

使python默認使用字符串替換不可編碼的字符

回答

相關問題