我正在使用Python 3.6.0b2。'utf-8'編解碼器無法編碼字符' udcc2':代理不允許
我解析了很多電子郵件。這個特定的電子郵件是一個問題,因爲我無法打印電子郵件地址的顯示名稱。試圖打印的電子郵件地址顯示名稱給出:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 30: surrogates not allowed
下面是測試情況下一段代碼,顯示瞭如何重現該問題:
(venv3.6) [email protected]:/opt/mailripper$ cat test.py
from email import policy
from email.headerregistry import Address
from email.parser import BytesHeaderParser, BytesParser
email_bytes = b'From: =?utf-8?Q?John_Smith=2C_Prince2=C2=AE=2CPMP=C2=AE=2C_CSM=C2?=\r\n =?utf-8?Q?=AE=2C_ITIL=C2=AE=2C_ISTQB=C2=AE?= <[email protected]>\r\n'
msg = BytesHeaderParser(policy=policy.default).parsebytes(email_bytes)
print(msg['from'])
print(msg['from'].addresses[0].display_name)
這裏是如由上面的代碼生成的錯誤:
(venv3.6) [email protected]:/opt/mailripper$ python test.py
"John Smith, Prince2®,PMP®, CSM� �, ITIL®, ISTQB®" <[email protected]>
Traceback (most recent call last):
File "test.py", line 8, in <module>
print(msg['from'].addresses[0].display_name)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 30: surrogates not allowed
這裏是作爲OSX電子郵件客戶端,這似乎能夠解析就OK了(這是截圖,剪裁要小)顯示的顯示名稱:
我的目標是能夠處理沒有統一代碼錯誤的任何電子郵件,也無需編寫自定義的Unicode錯誤處理代碼 - 這可能嗎?
任何人都可以建議我可以做些什麼來避免顯示電子郵件地址顯示名稱時出現Unicode錯誤?