我寫了一個腳本來解析電子郵件。當從Mac OS X Mail客戶端接收信件時(這個測試到目前爲止),它工作正常,但是當字母在其身體部分包含unicode字母時,我的分析器失效。當電子郵件正文中存在unicode字符時,使用python進行Gmail電子郵件解析
例如,我發送了一條消息,內容爲ąčę
。
這裏是我的腳本的一部分,同時解析正文和附件:
p = FeedParser()
p.feed(msg)
msg = p.close()
attachments = []
body = None
for part in msg.walk():
if part.get_content_type().startswith('multipart/'):
continue
try:
filename = part.get_filename()
except:
# unicode letters in filename, set default name then
filename = 'Mail attachment'
if part.get_content_type() == "text/plain" and not body:
body = part.get_payload(decode=True)
elif filename is not None:
content_type = part.get_content_type()
attachments.append(ContentFile(part.get_payload(decode=True), filename))
if body is None:
body = ''
嗯,我提到,它的工作原理與OS X Mail中的字母,但與Gmail的信件它不。
回溯:
Traceback (most recent call last): File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/core/handlers/base.py", line 116, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 77, in wrapped_view return view_func(*args, **kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/http.py", line 41, in inner return func(request, *args, **kwargs) File "/Users/aemdy/PycharmProjects/rezervavau/bms/messages/views.py", line 66, in accept Message.accept(request.POST.get('msg')) File "/Users/aemdy/PycharmProjects/rezervavau/bms/messages/models.py", line 261, in accept thread=thread File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py", line 149, in create return self.get_query_set().create(**kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py", line 391, in create obj.save(force_insert=True, using=self.db) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py", line 532, in save force_update=force_update, update_fields=update_fields) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py", line 627, in save_base result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py", line 215, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py", line 1633, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 920, in execute_sql cursor.execute(sql, params) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/util.py", line 47, in execute sql = self.db.ops.last_executed_query(self.cursor, sql, params) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/operations.py", line 201, in last_executed_query return cursor.query.decode('utf-8') File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 115: invalid continuation byte
我的劇本給我了以下機身����
。我怎樣才能解碼它得到ąčę
回來?
與您發送特殊字符的字符串的Latin-1編碼和你試圖將其解釋爲utf-8,這顯然失敗了。 –
但我需要一種通用的方式來解析電子郵件的正文。我怎樣才能做到這一點?將latin-1解碼後得到'àèæë' – aemdy