2011-06-20 260 views
2

我試圖在網頁中顯示電子郵件。該程序是用Python編寫的。不幸的是,我有一些字符編碼問題。我在文中引用了引號和雙引號。字符串中的Python特殊字符

原始郵件:

「All is good」 
‘it is getting better’ 

與字符集 '窗口1252' 我從ISP得到:

=93All is good=94 
=91it is getting better=92 

與字符集 'UTF-8' 我從ISP得到:

=E2=80=9CAll is good=E2=80=9D 
=E2=80=98it is getting better=E2=80=99 

我用相應的十六進制字符替換=..。隨後,文本看起來像:

character set 'windows-1252' 
ôAll is goodö 
æit is getting betterÆ 


character set 'utf-8' 
ΓÇ£All is goodΓÇ¥ 
ΓÇÿit is getting betterΓÇÖ 

後續調用UNICODE函數失敗,

UnicodeEncodeError: 'charmap' codec can't encode character u'\u201d' in position 6: 
character maps to <undefined> 

或相似。

該調用看起來像unicode(message, 'utf-8', 'replace')。 任何想法我做錯了什麼?

+0

請在下次再適當地格式化您的代碼,請在編輯您的代碼時查看我是否犯了錯誤。謝謝! – Trufa

回答

3

爲什麼要用任何東西來替換任何東西?

>>> m = email.message_from_string('''Content-Type: text/plain; utf-8\nContent-Transfer-Encoding: quoted-printable\n\n=E2=80=9CAll is good=E2=80=9D\n=E2=80=98it is getting better=E2=80=99''') 
>>> m.get_payload(decode=True).decode(m['Content-Type'].split('; ')[1])u'\u201cAll is good\u201d\n\u2018it is getting better\u2019' 
0

因爲我試過這個,我遇到了問題。這裏是另一個嘗試:

輸出看起來像:

# lines is already prefilled with a valid HTML message 
m = email.message_from_string(lines); 
email.iterators._structure(m); 
print m.is_multipart(); 
print m.get_payload(decode=True); 
print m.get_payload(); 

輸出看起來像:

> > >的execfile( 'email2.py')
多/替代品
         純文本/
          text/html的


[在0x0235FDF0 > < email.message.Message例如,在0x02355F08 > < email.message.Message實例]

你看,如果我使用decode='true'則失敗。這裏是簡化的電子郵件地址:

Content-Type: multipart/alternative; 
    boundary="----=_NextPart_000_0130_01CC1E30.41026040" 

This is a multi-part message in MIME format. 

------=_NextPart_000_0130_01CC1E30.41026040 
Content-Type: text/plain; 
    charset="utf-8" 
Content-Transfer-Encoding: quoted-printable 

plain 

------=_NextPart_000_0130_01CC1E30.41026040 
Content-Type: text/html; 
    charset="utf-8" 
Content-Transfer-Encoding: quoted-printable 

html 

------=_NextPart_000_0130_01CC1E30.41026040--