蟒蛇郵件編碼的問題

我提取使用Gmail中的郵件執行以下操作：蟒蛇郵件編碼的問題

def getMsgs(): 
try: 
    conn = imaplib.IMAP4_SSL("imap.gmail.com", 993) 
    except: 
    print 'Failed to connect' 
    print 'Is your internet connection working?' 
    sys.exit() 
    try: 
    conn.login(username, password) 
    except: 
    print 'Failed to login' 
    print 'Is the username and password correct?' 
    sys.exit() 

    conn.select('Inbox') 
    # typ, data = conn.search(None, '(UNSEEN SUBJECT "%s")' % subject) 
    typ, data = conn.search(None, '(SUBJECT "%s")' % subject) 
    for num in data[0].split(): 
    typ, data = conn.fetch(num, '(RFC822)') 
    msg = email.message_from_string(data[0][1]) 
    yield walkMsg(msg) 

def walkMsg(msg): 
    for part in msg.walk(): 
    if part.get_content_type() != "text/plain": 
     continue 
    return part.get_payload()

然而，一些電子郵件，我得到的幾乎是不可能的，我從中提取日期（使用正則表達式）的編碼相關的字符，如'='，隨機落在各種文本字段的中間。這裏就是它，我想提取發生在日期範圍的例子：

名稱：基爾斯蒂電子郵件： [email protected]電話號碼：+ 999 99995192黨總：4總，0 孩子到達/出發時間：10月9日= ， 2010 - 2010年10月13日 - 2010年10月13日

有沒有辦法來消除這些編碼的字符？

來源

2010-10-28 timbo

是的......我認爲它把那些有換行符換行的地方。應該是一個lib來正確解碼它。 – mpen 2010-10-28 07:13:57

你可以/應該使用email.parser模塊解碼郵件，例如：

from email.parser import FeedParser 
f = FeedParser() 
f.feed("<insert mail message here, including all headers>") 
rootMessage = f.close() 

# Now you can access the message and its submessages (if it's multipart) 
print rootMessage.is_multipart() 

# Or check for errors 
print rootMessage.defects 

# If it's a multipart message, you can get the first submessage and then its payload 
# (i.e. content) like so: 
rootMessage.get_payload(0).get_payload(decode=True)

使用「解碼」參數（快速和骯髒的例子！） Message.get_payload，模塊根據其編碼自動解碼內容（例如，在您的問題中引用了printables）。

來源

2010-10-28 07:10:31 AndiDog

decode = True當charset是us-ascii時不起作用。 – Ale 2012-11-07 18:47:03

這就是所謂的quoted-printable編碼。你可能想使用類似quopri.decodestring - http://docs.python.org/library/quopri.html

來源

2010-10-28 05:45:45

蟒蛇郵件編碼的問題

回答

相關問題